platform_build_soong/scripts/hiddenapi/merge_csv.py

#!/usr/bin/env python
#
# Copyright (C) 2018 The Android Open Source Project
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Merge multiple CSV files, possibly with different columns.
"""

import argparse
import csv
import io
import heapq
import itertools
import operator

from zipfile import ZipFile

args_parser = argparse.ArgumentParser(
    description='Merge given CSV files into a single one.'
)
args_parser.add_argument(
    '--header',
    help='Comma separated field names; '
    'if missing determines the header from input files.',
)
args_parser.add_argument(
    '--zip_input',
    help='Treat files as ZIP archives containing CSV files to merge.',
    action="store_true",
)
args_parser.add_argument(
    '--key_field',
    help='The name of the field by which the rows should be sorted. '
    'Must be in the field names. '
    'Will be the first field in the output. '
    'All input files must be sorted by that field.',
)
args_parser.add_argument(
    '--output',
    help='Output file for merged CSV.',
    default='-',
    type=argparse.FileType('w'),
)
args_parser.add_argument('files', nargs=argparse.REMAINDER)
args = args_parser.parse_args()


def dict_reader(csvfile):
    return csv.DictReader(csvfile, delimiter=',', quotechar='|')


csv_readers = []
if not args.zip_input:
    for file in args.files:
        csv_readers.append(dict_reader(open(file, 'r')))
else:
    for file in args.files:
        with ZipFile(file) as zipfile:
            for entry in zipfile.namelist():
                if entry.endswith('.uau'):
                    csv_readers.append(
                        dict_reader(io.TextIOWrapper(zipfile.open(entry, 'r')))
                    )

if args.header:
    fieldnames = args.header.split(',')
else:
    headers = {}
    # Build union of all columns from source files:
    for reader in csv_readers:
        for fieldname in reader.fieldnames:
            headers[fieldname] = ""
    fieldnames = list(headers.keys())

# By default chain the csv readers together so that the resulting output is
# the concatenation of the rows from each of them:
all_rows = itertools.chain.from_iterable(csv_readers)

if len(csv_readers) > 0:
    keyField = args.key_field
    if keyField:
        assert keyField in fieldnames, (
            "--key_field {} not found, must be one of {}\n"
        ).format(keyField, ",".join(fieldnames))
        # Make the key field the first field in the output
        keyFieldIndex = fieldnames.index(args.key_field)
        fieldnames.insert(0, fieldnames.pop(keyFieldIndex))
        # Create an iterable that performs a lazy merge sort on the csv readers
        # sorting the rows by the key field.
        all_rows = heapq.merge(*csv_readers, key=operator.itemgetter(keyField))

# Write all rows from the input files to the output:
writer = csv.DictWriter(
    args.output,
    delimiter=',',
    quotechar='|',
    quoting=csv.QUOTE_MINIMAL,
    dialect='unix',
    fieldnames=fieldnames,
)
writer.writeheader()

# Read all the rows from the input and write them to the output in the correct
# order:
for row in all_rows:
    writer.writerow(row)
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00			`#!/usr/bin/env python`
			`#`
			`# Copyright (C) 2018 The Android Open Source Project`
			`#`
			`# Licensed under the Apache License, Version 2.0 (the "License");`
			`# you may not use this file except in compliance with the License.`
			`# You may obtain a copy of the License at`
			`#`
			`# http://www.apache.org/licenses/LICENSE-2.0`
			`#`
			`# Unless required by applicable law or agreed to in writing, software`
			`# distributed under the License is distributed on an "AS IS" BASIS,`
			`# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.`
			`# See the License for the specific language governing permissions and`
			`# limitations under the License.`
Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`"""Merge multiple CSV files, possibly with different columns.`
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00			`"""`

			`import argparse`
			`import csv`
			`import io`
Sort hiddenapi monolithic files by signature Adds a new --key_field option to merge_csv.py which specifies the name of the field that should be used to sort the input. If specified it causes that field to be the first in each row and performs the merge operation of a merge sort on the input files. That assumes that each input file is already sorted into the same order. Modifies the rules that use merge_csv.py to pass in: --key_field signature to sort the rows by signature. Bug: 180387396 Test: Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change other than changing the order. Change-Id: Idcd5f0fea373b520b604889e1c280f21ed495660 2021-02-16 17:57:06 +01:00			`import heapq`
			`import itertools`
			`import operator`
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00
			`from zipfile import ZipFile`

Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`args_parser = argparse.ArgumentParser(`
			`description='Merge given CSV files into a single one.'`
			`)`
			`args_parser.add_argument(`
			`'--header',`
			`help='Comma separated field names; '`
			`'if missing determines the header from input files.',`
			`)`
			`args_parser.add_argument(`
			`'--zip_input',`
			`help='Treat files as ZIP archives containing CSV files to merge.',`
			`action="store_true",`
			`)`
			`args_parser.add_argument(`
			`'--key_field',`
			`help='The name of the field by which the rows should be sorted. '`
			`'Must be in the field names. '`
			`'Will be the first field in the output. '`
			`'All input files must be sorted by that field.',`
			`)`
			`args_parser.add_argument(`
			`'--output',`
			`help='Output file for merged CSV.',`
			`default='-',`
			`type=argparse.FileType('w'),`
			`)`
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00			`args_parser.add_argument('files', nargs=argparse.REMAINDER)`
			`args = args_parser.parse_args()`


Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`def dict_reader(csvfile):`
			`return csv.DictReader(csvfile, delimiter=',', quotechar='\|')`

Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00
			`csv_readers = []`
Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`if not args.zip_input:`
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00			`for file in args.files:`
			`csv_readers.append(dict_reader(open(file, 'r')))`
Allow explicitly specified additional annotations for hiddenapi Adds the hiddenapi_additional_annotations to allow a library to list the libraries that provided additional hiddenapi related annotations for a library. Modifies merge_csv.py so it can process multiple zip files at the same time and uses that to merge the embedded .uau files from a module and those it depends upon. Bug: 180102243 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: I796520021c7357398a9e2a09f1029e4a578b05b3 2021-02-12 12:46:42 +01:00			`else:`
			`for file in args.files:`
Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`with ZipFile(file) as zipfile:`
			`for entry in zipfile.namelist():`
Allow explicitly specified additional annotations for hiddenapi Adds the hiddenapi_additional_annotations to allow a library to list the libraries that provided additional hiddenapi related annotations for a library. Modifies merge_csv.py so it can process multiple zip files at the same time and uses that to merge the embedded .uau files from a module and those it depends upon. Bug: 180102243 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: I796520021c7357398a9e2a09f1029e4a578b05b3 2021-02-12 12:46:42 +01:00			`if entry.endswith('.uau'):`
Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`csv_readers.append(`
			`dict_reader(io.TextIOWrapper(zipfile.open(entry, 'r')))`
			`)`
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00
			`if args.header:`
			`fieldnames = args.header.split(',')`
			`else:`
Maintain header order in merge_csv Previously, if the --header property was not specified then merge_csv would use a header constructed by sorting all the fields in the input files. That required that any use of merge_csv which did not already have headers in the required order would have to explicitly specify the headers. That made it harder to use merge_csv as a generic tool as each invocation needed to be aware of what headers were exported in the output. This change causes merge_csv to simply use the headers in the order in which they are encountered in the input files. That removes the need to specify the --header option when generating the index files. Bug: 179354495 Test: m out/soong/hiddenapi/hiddenapi-index.csv out/soong/hiddenapi/hiddenapi-unsupported.csv - make sure that they are not changed by this change. Change-Id: I420b7d07aea85af6372cd7580a8be5e2cc82a513 2021-06-08 16:41:32 +02:00			`headers = {}`
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00			`# Build union of all columns from source files:`
			`for reader in csv_readers:`
Maintain header order in merge_csv Previously, if the --header property was not specified then merge_csv would use a header constructed by sorting all the fields in the input files. That required that any use of merge_csv which did not already have headers in the required order would have to explicitly specify the headers. That made it harder to use merge_csv as a generic tool as each invocation needed to be aware of what headers were exported in the output. This change causes merge_csv to simply use the headers in the order in which they are encountered in the input files. That removes the need to specify the --header option when generating the index files. Bug: 179354495 Test: m out/soong/hiddenapi/hiddenapi-index.csv out/soong/hiddenapi/hiddenapi-unsupported.csv - make sure that they are not changed by this change. Change-Id: I420b7d07aea85af6372cd7580a8be5e2cc82a513 2021-06-08 16:41:32 +02:00			`for fieldname in reader.fieldnames:`
			`headers[fieldname] = ""`
			`fieldnames = list(headers.keys())`
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00
Sort hiddenapi monolithic files by signature Adds a new --key_field option to merge_csv.py which specifies the name of the field that should be used to sort the input. If specified it causes that field to be the first in each row and performs the merge operation of a merge sort on the input files. That assumes that each input file is already sorted into the same order. Modifies the rules that use merge_csv.py to pass in: --key_field signature to sort the rows by signature. Bug: 180387396 Test: Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change other than changing the order. Change-Id: Idcd5f0fea373b520b604889e1c280f21ed495660 2021-02-16 17:57:06 +01:00			`# By default chain the csv readers together so that the resulting output is`
			`# the concatenation of the rows from each of them:`
			`all_rows = itertools.chain.from_iterable(csv_readers)`

			`if len(csv_readers) > 0:`
			`keyField = args.key_field`
			`if keyField:`
			`assert keyField in fieldnames, (`
Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`"--key_field {} not found, must be one of {}\n"`
			`).format(keyField, ",".join(fieldnames))`
Sort hiddenapi monolithic files by signature Adds a new --key_field option to merge_csv.py which specifies the name of the field that should be used to sort the input. If specified it causes that field to be the first in each row and performs the merge operation of a merge sort on the input files. That assumes that each input file is already sorted into the same order. Modifies the rules that use merge_csv.py to pass in: --key_field signature to sort the rows by signature. Bug: 180387396 Test: Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change other than changing the order. Change-Id: Idcd5f0fea373b520b604889e1c280f21ed495660 2021-02-16 17:57:06 +01:00			`# Make the key field the first field in the output`
			`keyFieldIndex = fieldnames.index(args.key_field)`
			`fieldnames.insert(0, fieldnames.pop(keyFieldIndex))`
			`# Create an iterable that performs a lazy merge sort on the csv readers`
			`# sorting the rows by the key field.`
			`all_rows = heapq.merge(*csv_readers, key=operator.itemgetter(keyField))`

			`# Write all rows from the input files to the output:`
Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`writer = csv.DictWriter(`
			`args.output,`
			`delimiter=',',`
			`quotechar='\|',`
			`quoting=csv.QUOTE_MINIMAL,`
			`dialect='unix',`
			`fieldnames=fieldnames,`
			`)`
Move hiddenapi tools used by build/soong from frameworks/base Also, creates a python_binary_host module for generate_hiddenapi_lists and uses that when constructing the build rule rather than using the file directly. Bug: 177317659 Test: m droid Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change. Change-Id: Ia11bb203ce5a74740d35f1b7e86716e15aad336e 2021-02-08 19:08:09 +01:00			`writer.writeheader()`
Sort hiddenapi monolithic files by signature Adds a new --key_field option to merge_csv.py which specifies the name of the field that should be used to sort the input. If specified it causes that field to be the first in each row and performs the merge operation of a merge sort on the input files. That assumes that each input file is already sorted into the same order. Modifies the rules that use merge_csv.py to pass in: --key_field signature to sort the rows by signature. Bug: 180387396 Test: Verified that hiddenapi files (both aggregated ones and for the individual modules) are not affected by this change other than changing the order. Change-Id: Idcd5f0fea373b520b604889e1c280f21ed495660 2021-02-16 17:57:06 +01:00
			`# Read all the rows from the input and write them to the output in the correct`
			`# order:`
			`for row in all_rows:`
Apply pylint to remaining scripts in hiddenapi 1. Run pyformat scripts/hiddenapi -s 4 --force_quote_type none -i to fix formatting. 2. rename restricted variable names (e.g. variable name "input" has been changed to "csvfile") 3. use pylint: disable=<X> where fixes are not obvious Test: m merge_csv signature_patterns signature_patterns_test Test: pylint --rcfile tools/repohooks/tools/pylintrc <file1> <file1_test> Bug: 195738175 Change-Id: I800a208f9c0ee1d32e68e4b20fd5933b3ab92c0e 2021-08-25 19:47:43 +02:00			`writer.writerow(row)`