This article is excerpted from Chapter 3 of Rex Black’s book Managing the Testing Process, 3e.
A number of RBCS clients find that obtaining good test data poses many challenges. For any large-scale system, testers usually cannot create sufficient and sufficiently diverse test data by hand; i.e., one record at a time. While data-generation tools exist and can create almost unlimited amounts of data, the data so generated often do not exhibit the same diversity and distribution of values as production data. For these reasons, many of our clients consider production data ideal for testing, particularly for systems where large sets of records have accumulated over years of use with various revisions of the systems currently in use, and systems previously in use.
However, to use production data, we must preserve privacy. Production data often contains personal data about individuals which must be handled securely. However, requiring secure data handling during testing activities imposes undesirable inefficiencies and constraints. Therefore, many organizations want to anonymize (scramble) the production data prior to using it for testing.
This anonymization process leads to the next set of challenges, though. The anonymization process must occur securely, in the sense that it is not reversible should the data fall into the wrong hands. For example, simply substituting the next digit or the next letter in sequence would be obvious to anyone it doesn’t take long to deduce that "Kpio Cspxo" is actually "John Brown" which makes the de-anonymization process trivial.
Read this article → (PDF 91 kB)