Much of the data that is available from the Data API represents a total for a region, such as the population of a city or the number of graduates from a university. Even so, there are many situations when data needs to be further summarized. For example, it may be necessary to summarize the population of two counties. It may be necessary to summarize the number of graduates from all universities in a geographic region. It may be preferable to just obtain a count of point-level data rather than seeing all of the data itself.
Custom geographies are another common scenario where summarization is needed. Custom geographies, such as drive time or drive distance searches, radius searches, or custom GeoJSON boundaries, generally qualify a series of census block groups. A single drive time search may qualify thousands of these block groups. While it is possible to view the data for each of these block groups, more commonly, users want to see a summary total for a datapoint for all block groups.
There are two ways to summarize data in the Data API - automatic summarization and custom summarization. Most users will prefer to use automatic summarization; however, both approaches are available.
Automatic Summarization
Automatic summarization provides the simplest and most convenient method of summarizing data. Automatic summarization avoids the complexities of how each data point must be summarized. Some data can simply be added together; however, other data points, like percentages and medians, require more complex handling to avoid nonsensical or incorrect results. Automatic summarization allows the API to choose the correct method for summarizing data points.
In the following examples, a request is shown that requests to total male population and the male population percentage for a combination of two counties. The first example does not request any summarization and so the results for each county is shown separately. The second example requests that the data be summarized.
var request = { "data": { "datapoints": [ "dem.acs.pop.male.val", "dem.acs.pop.male.pct" ] }, "criteria": { "geography": { "combined": ["city:burlington-vt", "city:south-burlington-vt"] } } } var response = { "resultset": { "geography": "custom:custom01", "data": [ { "datapoint": "dem.acs.pop.male.val", "periods": ["2017"], "source": "acs5", "geographies": [ { "geography": "city:burlington-vt", "values": [20773] }, { "geography": "city:south-burlington-vt", "values": [8997] } ] }, { "datapoint": "dem.acs.pop.male.pct", "periods": ["2017"], "source": "acs5", "geographies": [ { "geography": "city:burlington-vt", "values": [0.4893] }, { "geography": "city:south-burlington-vt", "values": [0.4793] } ] } ] } }
In this next example, the use of the "summary" key causes the datapoints to be summarized using the appropriate method for each. The male population can simply be added; however, the combined percentage requires also knowing the combined total population, which is not one of the requested datapoints. The API will internally retrieve this additional data and correctly determine the combined percentage.
var request = { "data": { "datapoints": [ "dem.acs.pop.male.val", "dem.acs.pop.male.pct" ] }, "criteria": { "geography": { "combined": ["city:burlington-vt", "city:south-burlington-vt"] } }, "summary": "auto" } var response = { "resultset": { "geography": "custom:custom01", "summary": [ { "data": [ { "datapoint": "summarize(dem.acs.pop.male.val)", "periods": ["2017"], "source": "acs5", "values": [29770] }, { "datapoint": "summarize(dem.acs.pop.male.pct)", "periods": ["2017"], "source": "acs5", "values": [0.4862] } ] } ] } }
Note that the data is enclosed in a summary structure, which simply indicates that the data has been summarized rather than returned as individual datapoints.
Also note that the name of the datapoint is a function: summarize(dem.acs.pop.male.val). This function means that the returned result is not the datapoint itself, but rather an aggregation of the datapoint. Specifically, the summarize() function tells the API to summarize the datapoint using whatever methodology is appropriate for the datapoint.
This third example requests a summarization of the block groups within a 60-minute drive time search. If a summarization was not requested, data for each of 462 block groups would be returned. Since a summary has been requested, a single set of data is returned for the entire drive time search.
var request = { "data": { "datapoints": [ "dem.acs.pop.male.val", "dem.acs.pop.male.pct" ] }, "criteria": { "geography": { "drivetime": 60, "geographicCentroid": "city:burlington-vt" } } } var response = { "resultset": { "geography": "custom:custom01", "summary": [ { "data": [ { "datapoint": "summarize(dem.acs.pop.male.val)", "periods": ["2017"], "source": "acs5", "values": [38588] }, { "datapoint": "summarize(dem.acs.pop.male.pct)", "periods": ["2017"], "source": "acs5", "values": [0.4863] } ] } ] } }
Collection Summarization
Collections are often summarized using the API. Collections inherently return multiple rows of data for each geographic region. While the rows themselves are often useful, it is also common to want a summary of all of those rows.
In the following example, the airports within a 100-mile radius of the city of Burlington, VT are automatically summarized. Two datapoints are mentioned - trn.airport.type and trn.airport.pass. The first datapoint, trn.airport.type representing the type of airport, is not a numeric datapoint and so it is automatically treated as an aggregation group. Each time the value of this datapoint changes, an independent summary will be returned. The second datapoint, trn.airport.pass representing the annual number of passengers, is a numeric datapoint and so it is automatically summed for all airports.
var request = { "data": { "collections": [ { "collection": "trn.airport", "name": "Number of airports by type", "datapoints": [ "trn.airport.type", "trn.airport.pass", "count" ], "summary": "auto" } ] }, "criteria": { "geography": { "name": "100-mile radius from Burlington, VT", "radius": 100, "geographicCentroid": "city:burlington-vt" } } } var response = { "resultset": { "geography": "custom:100-mile radius from Burlington, VT", "data": [ { "collection": "trn.airport", "summary": [ { "group": { "trn.airport.type": "Major" }, "periods": ["2017"], "data": [ { "datapoint": "trn.airport.type", "values": ["Major"] }, { "datapoint": "summarize(trn.airport.pass)", "values": [1432793] }, { "datapoint": "count(trn.airport)", "values": [2] } ] }, { "group": { "trn.airport.type": "Regional" }, "periods": ["2017"], "data": [ { "datapoint": "trn.airport.type", "values": ["Regional"] }, { "datapoint": "summarize(trn.airport.pass)", "values": [19710] }, { "datapoint": "count(trn.airport)", "values": [2] } ] } ] } ] } }
Custom Summarization
Automatic summarization is the most convenient and foolproof way to summarize the data; however, it is also possible to request custom summarization. Custom summarization allows the use of several functions and keys to fully control the aggregation process.
The following table shows the functions and keys that are available with summarizations:
Functions | Description |
summarize() | Summarize datapoints using most appropriate method |
sum() | Return the sum of the datapoints |
min() | Return the minimum value of the datapoints |
max() | Return the maximum value of the datapoints |
avg() | Return the average or mean of the datapoints |
count | Return the count of the datapoints |
Keys | |
groupby | Return summaries whenever the datapoints in the groupby clause changes. |
In the following example, the airports within a 100-mile radius of the city of Burlington, VT are summarized using specific functions and a groupby key.
var request = { "data": { "collections": [ { "collection": "trn.airport", "name": "Number of airports by type", "datapoints": [ "trn.airport.type", "sum(trn.airport.pass)", "avg(trn.airport.frght)", "min(trn.airport.runway)", "max(trn.airport.runway)", "count" ], "summary": { "groupby": "trn.airport.type" } } ] }, "criteria": { "geography": { "name": "100-mile radius from Burlington, VT", "radius": 100, "geographicCentroid": "city:burlington-vt" } } } var response = { "resultset": { "geography": "custom:100-mile radius from Burlington, VT", "data": [ { "collection": "trn.airport", "summary": [ { "group": { "trn.airport.type": "Major" }, "periods": ["2017"], "data": [ { "datapoint": "trn.airport.type", "values": ["Major"] }, { "datapoint": "sum(trn.airport.pass)", "values": [1432793] }, { "datapoint": "avg(trn.airport.frght)", "values": [5235946.50] }, { "datapoint": "min(trn.airport.runway)", "values": [8319] }, { "datapoint": "max(trn.airport.runway)", "values": [11759] }, { "datapoint": "count(trn.airport)", "values": [2] } ] }, { "group": { "trn.airport.type": "Regional" }, "periods": ["2017"], "data": [ { "datapoint": "trn.airport.type", "values": ["Regional"] }, { "datapoint": "sum(trn.airport.pass)", "values": [19710] }, { "datapoint": "avg(trn.airport.frght)", "values": [137043.50] }, { "datapoint": "min(trn.airport.runway)", "values": [5303] }, { "datapoint": "max(trn.airport.runway)", "values": [6573] }, { "datapoint": "count(trn.airport)", "values": [2] } ] } ] } ] } }
Limitations
There are some limitations in the ability to summarize data within the API.
- Some data cannot be summarized at all, such as average yearly changes and average yearly change percentages.
- Some data can only be summarized by showing the range of values. This is especially true for many median values.
Comments
0 comments
Article is closed for comments.