Mesh and Field Data Generation¶
MACSio is capable of generating a wide variety of mesh and variable data and amorphous metadata typical of HPC multi-physics applications. Currently, MACSio can generate structured, rectilinear, unstructured and arbitrary polyhedral, multi-block meshes in 2 or 3 dimensions. In addition, regardless of the particular type of mesh MACSio generates for purposes of I/O performance testing, it stores and marshalls all of the resultant data in an uber JSON-C object that is passed around witin MACSio and between MACSIO and its I/O plugins.
MACSio employs a very simple algorithm to generate and then decompose a mesh in parallel. However, the decomposition is also general enough to create multiple mesh pieces on individual MPI ranks and for the number of mesh pieces to vary, somewhat, between MPI ranks. At present, there is no support to explicitly specify a particular arrangement of mesh pieces and MPI ranks. However, such enhancement is planned for the near future.
Once the global whole mesh shape is determined as a count of total pieces and as counts of pieces in each of the logical dimensions, MACSio uses a very simple algorithm to assign mesh pieces to MPI ranks. The global list of mesh pieces is numbered starting from 0. First, the number of pieces to assign to rank 0 is chosen. When the average piece count is non-integral, it is a value between K and K+1. So, MACSio randomly chooses either K or K+1 pieces but being carful to weight the randomness so that once all pieces are assigned to all ranks, the average piece count per rank target is achieved. MACSio then assigns the next K or K+1 numbered pieces to the next MPI rank. It continues assigning pieces to MPI ranks, in piece number order, until all MPI ranks have been assigned pieces. The algorithm runs indentically on all ranks. When the algorithm reaches the part assignment for the rank on which its executing, it then generates the K or K+1 mesh pieces for that rank. Although the algorithm is essentially a sequential algorithm with asymptotic behavior O(#total pieces), it is primarily a simple book-keeping loop which completes in a fraction of a second even for more than one million pieces.
Each piece of the mesh is a simple rectangular region of space. The spatial bounds of that region are easily determined. Any variables to be placed on the mesh can be easily handled as long as the variable’s spatial variation can be described in the global goemetric space.
Data API¶
-
group
MACSIO_DATA
Data (Mesh) Generation.
Functions
-
struct json_object *
MACSIO_DATA_GenerateTimeZeroDumpObject
(struct json_object *main_obj, int *rank_owning_chunkId)¶ - Parameters
main_obj
: The main JSON object holding mesh, field, amorphous datarank_owning_chunkId
: missing this info
-
int
MACSIO_DATA_GetRankOwningPart
(struct json_object *main_obj, int chunkId)¶ Given a chunkId, return rank of owning task.
-
int
MACSIO_DATA_ValidateDataRead
(struct json_object *main_obj)¶ Not yet implemented.
-
int
MACSIO_DATA_SimpleAssignKPartsToNProcs
(int k, int n, int my_rank, int *my_part_cnt, int **my_part_ids)¶ Not yet implemented.
-
struct json_object *
MACSIO_DATA_EvolveDataset
(struct json_object *main_obj, int *dataset_evolved, float factor, int growth_bytes)¶ Vary mesh and field data including changing not only values but sizes.
-
struct json_object *
Random (Amourphous) Object Generation¶
In addition to the data necessary to represent the main mesh and field MACSio produces, it is also common for HPC multi-physics code to generate modest amount of amourphous metadata dealing with such things as materials, material models and material properties tables, user input controls, performance tracking information, library versions and data provenenance, debugging information, etc. This kind of information often takes the form of a large hierarchy of key-value pairs where the values can sometimes be arrays on the order of the size of the number of tasks, etc.
Two methods in MACSio’s data generation package help to create some random but nonetheless statistically similar kinds of data.
-
group
MACSIO_RANDOBJ
Construct a random JSON object.
Functions
-
struct json_object *
MACSIO_DATA_MakeRandomObject
(int maxd, int nraw, int nmeta, unsigned maxds)¶ - Parameters
maxd
: maximum depth of object treenraw
: bytes of raw data (in extarr objects) in final objectnmeta
: bytes of meta data (in non-extarr objects) in final objectmaxds
: maximum size of any given raw (extarr) object
-
struct json_object *
MACSIO_DATA_MakeRandomTable
(int nrecs, int totbytes)¶ Construct a random table of some number of random JSON objects.
- Parameters
nrecs
: number of records in the tabletotbytes
: total bytes in the table
-
struct json_object *
Randomization¶
Randomization of various of MACSio’s behaviors may be explicitly enforced or prevented. However, to ensure such behaviors, random number generation needs to be handled carefully especially when dealing with reproducibility from run to run and/or agreement across parallel tasks or both.
For this reason, MACSio creates and initializes several default pseudo random number generators (PRNGs), all of which utilize the C standard library random() method.
Pseudo Random Number Generator (PRNG) support is handled by maintaining a series of state vectors for calls to initstate/setstate of C library random() and friends. Multiple different PRNGs are supported each having zero or more special properties depending on how they are initialized (e.g. the seed used from run to run and/or from task to task) as well as how they are expected to be used/called. For example, if a PRNG is intended to produce identical values on each task, then it is up to the caller to ensure the respective PRNG is called consistently (e.g. collectively) across tasks. Otherwise, unintended or indeterminent behavior may result.
The default PRNGs MACSio creates are…
- Naive
MD_random_naive
The naive prng is fine for any randomization that does not need special behaviors. This is the one that is expected to be used most of the time. Think of it as a wholesale replacement for using C library random() directly.
MD_random
is a shorthand alias forMD_random_naive
- Naive, Time Varying
MD_random_naive_tv
Like
MD_random_naive
except that it is seeded based on current time and will therefore ensure variation from run to run (e.g. is time varying). It should be used in place ofMD_random_naive
where randomization from run to run is needed.- Naive, Rank Varying
MD_random_naive_rv
Like
MD_random_naive
but ensures variation across tasks (e.g. is rank varying) by seeding based on task rank. It should be used in place ofMD_random_naive
where randomization from task to task is needed.- Naive, Rank and Time Varying
MD_random_naive_rtv
Like
MD_random_naive_rv
but also ensures results vary from rank to rank and run to run.- Rank Invariant
MD_random_rankinv
Is neither seeded based on rank or time nor shall be used by a caller in way that would cause inconsistency across ranks. It is up to the caller to ensure it is called consistently (e.g. collectively) across ranks. Otherwise indeterminent behavior may result.
- Rank Invariant, Time Varying
MD_random_rankinv_tv
Like
MD_random_rankinv
except ensures variation in randomization from run to run. So, for each given run, all ranks will agree on randomization but from run to run, that agreement will vary.
If these PRNGs are not sufficient, developers are free to create other PRNGs for
other specific purposes using MACSIO_DATA_CreatePRNG()
.
PRNG API¶
-
group
MACSIO_PRNGS
Randomization and Pseudo Random Number Generators.
Defines
-
MD_random_naive
()¶ Get next random value from naive PRNG. Same as MD_random().
The naive prng is fine for any randomization that does not need special behaviors. This is the one that is expected to be used most of the time.
-
MD_random
()¶ Convenience shorthand for MD_random_naive()
-
MD_random_naive_tv
()¶ Get next random value from naive, time-variant PRNG.
The naive_tv PRNG is like naive except that it is seeded based on current time and so will change from run to run (e.g. is time-variant). It should be used in place of MD_random_naive() when randomization from run to run is required.
-
MD_random_naive_rv
()¶ Get next random value from naive, rank-variant PRNG.
Like MD_random_naive() but guarantees variation across ranks.
-
MD_random_naive_rtv
()¶ Get next random value from naive, rank- and time-variant PRNG.
Like MD_random_naive_rv() but guarantees variation from run to run.
-
MD_random_rankinv
()¶ Get next random value from rank-invariant PRNG.
The rank_invariant PRNG is neither seeded based on rank nor shall be used in way that would cause inconsistency across ranks. Failing to call it collectively will likely cause indeterminent behavior. There is no enforcement of this requirement. However, a sanity check is performed at application termination…there is a collective MPI_BXOR reduce on the current value from this PRNG and if the result is non-zero, an error message is generated in the logs.
-
MD_random_rankinv_tv
()¶ Get next random value from rank-invariant but time-variant PRNG.
Like the MD_random_rankinv() except causes variation in randomization from run to run
Functions
-
int
MACSIO_DATA_CreatePRNG
(unsigned seed)¶ Create a Pseudo Random Number Generator (PRNG)
- Parameters
seed
: seed for initializing random number generator
- Return
An integer identifier for the created PRNG
-
void
MACSIO_DATA_ResetPRNG
(int prng_id)¶ Reset a PRNG This will cause the specified PRNG to restart the sequence of pseudo random numbers it generates.
- Parameters
prng_id
: identifier for the PRNG to reset
-
long
MACSIO_DATA_GetValPRNG
(int prng_id)¶ Get next value from PRNG.
- Parameters
prng_id
: identifer of the desired PRNG to get a value from
- Return
Next value from the PRNG sequence. Logically equivalent to calling random().
-
void
MACSIO_DATA_DestroyPRNG
(int prng_id)¶ Free resources associated a PRNG.
- Parameters
prng_id
: identifer of the desired PRNG to free
-
void
MACSIO_DATA_InitializeDefaultPRNGs
(unsigned mpi_rank, unsigned curr_time)¶ Initialize default Pseudo Random Number Generators (PRNGs) Should be called near beginning of application startup. Creates the (default) PRNGs described here. Applications are free to create others as needed.
- Parameters
mpi_rank
: unsigned representation of MPI rank of calling processorcurr_time
: unsigned but otherwise arbitrary representation of current time but which is equal on all ranks yet guaranteed to vary from run to run.
-
void
MACSIO_DATA_FinalizeDefaultPRNGs
(void)¶ Free up resources for default PRNGs Should be called near the termination of application.
-
PRNG Example (tstprng.c)¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <sys/time.h>
#ifdef HAVE_MPI
#include <mpi.h>
#endif
#include <macsio_data.h>
int main(int argc, char **argv)
{
int i, id1, id2, id3, id5;
int parallel = 0, ckfail = 0;
long tseed;
long series1[100], series2[100], series3[100], series4[100][5], series5[100][5];
int rank=0, size=1;
struct timeval tv;
for (i = 1; i < argc; i++)
{
if (!strcasestr(argv[i], "--parallel"))
parallel = 1;
else if (!strcasestr(argv[i], "--ckfail"))
ckfail = 1;
}
#ifdef HAVE_MPI
if (parallel)
{
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
}
#endif
gettimeofday(&tv, 0);
tseed = (long) (((unsigned int) (tv.tv_sec + tv.tv_usec)) & 0xFFFFFFFF);
MACSIO_DATA_InitializeDefaultPRNGs(rank, tseed);
id1 = MACSIO_DATA_CreatePRNG(0xDeadBeef);
id2 = MACSIO_DATA_CreatePRNG(tseed); /* time varying output */
id3 = MACSIO_DATA_CreatePRNG(0xBabeFace);
id5 = MACSIO_DATA_CreatePRNG(0xDeadBeef);
for (i = 0; i < 100; i++)
{
int j, id4 = MACSIO_DATA_CreatePRNG(0x0BedFace);
series1[i] = MACSIO_DATA_GetValPRNG(id1);
series2[i] = i%3?0:MACSIO_DATA_GetValPRNG(id2);
series3[i] = i%5?0:MACSIO_DATA_GetValPRNG(id3);
for (j = 0; j < 5; j++)
series4[i][j] = MACSIO_DATA_GetValPRNG(id4);
MACSIO_DATA_DestroyPRNG(id4);
for (j = 0; j < 5; j++)
series5[i][j] = MACSIO_DATA_GetValPRNG(id5);
MACSIO_DATA_ResetPRNG(id5);
if (i && !(i%50))
MACSIO_DATA_ResetPRNG(id2);
MD_random();
MD_random_rankinv();
MD_random_rankinv_tv();
if (ckfail && rank == 0 && i < 5)
{
MD_random_rankinv();
MD_random_rankinv_tv();
}
}
if (memcmp(&series2[0], &series2[51], 40*sizeof(long)))
return 1;
if (memcmp(&series4[0], &series4[1], 5*sizeof(long)))
return 1;
if (memcmp(&series4[1], &series4[23], 5*sizeof(long)))
return 1;
if (memcmp(&series5[0], &series5[1], 5*sizeof(long)))
return 1;
if (memcmp(&series5[1], &series5[23], 5*sizeof(long)))
return 1;
MACSIO_DATA_DestroyPRNG(id1);
MACSIO_DATA_DestroyPRNG(id2);
MACSIO_DATA_DestroyPRNG(id3);
MACSIO_DATA_DestroyPRNG(id5);
#ifdef HAVE_MPI
if (parallel)
{
/* Lets confirm the rank-invariant PRNGs agree and issue a message if not */
unsigned rand_check[2] = {MD_random_rankinv(), MD_random_rankinv_tv()};
unsigned rand_check_r[2];
MPI_Reduce(rand_check, rand_check_r, 2, MPI_UNSIGNED, MPI_BXOR, 0, MPI_COMM_WORLD);
assert(rand_check_r[0] == 0 && rand_check_r[1] == 0);
}
#endif
MACSIO_DATA_FinalizeDefaultPRNGs();
#ifdef HAVE_MPI
if (parallel)
MPI_Finalize();
#endif
return 0;
}
|