Problem Statement |
| A company has recently released a large database. Each record in the database
corresponds to one user. There are N public fields which do not contain
sensitive information, and M private fields which do. The company claims that
it has added noise to anonymize the database and that there is no way to
recover accurate private values from unperturbed public information. You're
not so sure...
You will be given the perturbed database and unperturbed public information
for a number of users (but not everyone). Your task is to return your best
guess as to the unperturbed private values of the users whose public
information you know.
The data will be generated by first picking N and M. Then, C gaussian
distributions will be generated in N+M dimensional space. The mean and
standard deviation of each dimension of each gaussian will be selected
independently and uniformly from [-1,1] and [0,1]. To generate each of U
users, one of the C gaussians will be selected at random, and a sample will be
pulled from it. Finally, the data will be perturbed by adding gaussian noise to
every value, sampled from a distribution with mean 0 and standard deviation
SD. You will then be given the unperturbed public information about Q users
(selected randomly from the U) and are to return your best guess as to their
unperturbed private information.
Your raw score will be the mean squared error of your return. This will be
compared to a baseline measure which, for every user whose unperturbed public
information is given, takes the expected value of the private information
(given the C distributions used to generate the data). Your score will be
your improvement over this baseline (in mean squared error). Your overall
score will simply be the sum of these scores, with any negative values
increased to 0.
You must implement a method deanonymize. Your method will take
N, M and SD as described
above. The parameter database will give you U records, where field
j of record i is given by
database[i*(N+M)+j] and the first N
fields are the public ones. users will give you the unperturbed users whose
data we are interested in, where public field j of user i is
given by users[i*N+j]. You should return an array with
Q*M elements, grouped by user in the same way that the users
array is.
An offline
tester is also available.
|
|
Definition |
| Class: | Deanonymize | Method: | deanonymize | Parameters: | int, int, double[], double[], double | Returns: | double[] | Method signature: | double[] deanonymize(int N, int M, double[] database, double[] users, double SD) | (be sure your method is public) |
|
|
|
|
Notes |
- | The memory limit is 2048MB, however you may experience performance degradation due to swapping when using more than 1024MB. |
- | The time limit is 20 seconds. |
- | There are 100 provisional test cases. |
|
Constraints |
- | In generating the parameters, rand() is a uniform random variable in [0,1], and floor is used for rounding to integers when necessary. |
- | N will be 100rand() |
- | M will be 100rand() |
- | C will be 1000rand() |
- | U will be 100+100000rand() |
- | Q will be U/10 |
- | SD will be (rand()+0.5)*min(0.5,U-1/N) |
|
Examples |
0) | |
| | Returns: "Seed = 1
N = 15
M = 6
SD = 0.6794676326705618
U = 146
Q = 14
" | |
|
1) | |
| | Returns: "Seed = 2
N = 7
M = 1
SD = 0.335403841291766
U = 682
Q = 68
" | |
|
2) | |
| | Returns: "Seed = 3
N = 43
M = 1
SD = 0.6342912229585391
U = 653
Q = 65
" | |
|
3) | |
| | Returns: "Seed = 4
N = 55
M = 2
SD = 0.7483313334749946
U = 947
Q = 94
" | |
|
4) | |
| | Returns: "Seed = 5
N = 7
M = 1
SD = 0.34758487121050863
U = 9770
Q = 977
" | |
|
5) | |
| | Returns: "Seed = 6
N = 92
M = 17
SD = 0.3416127850569974
U = 146
Q = 14
" | |
|
6) | |
| | Returns: "Seed = 7
N = 5
M = 4
SD = 0.20131294912571285
U = 697
Q = 69
" | |
|
7) | |
| | Returns: "Seed = 8
N = 6
M = 10
SD = 0.44062796822464706
U = 111
Q = 11
" | |
|
8) | |
| | Returns: "Seed = 9
N = 22
M = 46
SD = 0.533073548894668
U = 27926
Q = 2792
" | |
|
9) | |
| | Returns: "Seed = 10
N = 3
M = 1
SD = 0.038548775274140114
U = 18476
Q = 1847
" | |
|
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2020, TopCoder, Inc. All rights reserved.