Behind the Bytes: Exploring the World of Digital Forensics Incident Response

Apple Fitness Workout Location Data: Leveraging the healthdb_secure.sqlite Database

Forensics Question: How can we leverage the location data stored within the healthdb_secure.sqlite database and what indications of the user’s workout activities can be gathered from the review of this data?

Tools:

DB Browser for SQLite Version 3.12.2

Background:

iOS 16 brought many changes for digital forensics and one, seemingly, unexplored area was the increased wealth of location data stored within the healthdb_secure.sqlite database. The location data is stored within the “location_series_data” table of the database for specific workout activity types initiated on the Apple Watch. The data includes: a series identifier, timestamp, latitude, longitude, altitude, speed, course, horizontal accuracy, vertical accuracy, speed accuracy, course accuracy, and signal environment. (Note: additional research is required to identify the values for the signal environment). What is somewhat surprising about the data is 1) the sheer number of location points (44,590 over 31 workouts) and 2) the accuracy of the location points. Of the total 44,590 location points, the largest horizontal accuracy was 26.791 meters and the smallest was 0.59862 meters. The “location_series_data” table by itself does not correlate the location data to the ongoing workout during location data capture.

This article will explore an initial effective, but inefficient, SQL query correlating location data to the ongoing workout; an effective and efficient SQL query correlating location data to the ongoing workout; and finally, a SQL query which provides an analysis as to the actions likely taken by the user for the initiation, pausing, and completion of workouts.

Beginning Work:

Initially, two tables of the database were used as they contained the required starting and ending points of the data being analyzed, that is the location data itself and the workout activity type. As I will show, it is possible to arrive at largely the correct outcome by creating a forced join of these two tables; however, it is extremely inefficient in processing time and results outcome. The query resulted in 44,581 rows of data which were processed in 33,030ms, or 33.03 seconds, when run through DB Browser for SQLite. If you have prior experience with DB Browser for SQLite you probably have not had to wait that long for a query result. But, to start, we will examine the tables being discussed.

The “location_series_data” table:

Here we have a first view of the “location_series_data” table of the healthdb_secure.sqlite database. (Note: latitude and longitude values have been removed).

The “workout_activities” table:

And a first view of the “workout_activities” table of the healthdb_secure.sqlite database. With iOS 16, this table now contains the starting and ending timestamps for workouts, the duration of the workouts in seconds, and the activity type which relates to the specific workout selected by the user or suggested by the Watch.

Initially, my SQL query contained all known “activity_type” values which has been reduced for depiction size for this work to the primary workouts which appear to log location data: Indoor/Outdoor Cycle, Outdoor Run, and Outdoor/Indoor Walk.

The query first selects the timestamp and workout activity type which is then followed by all pertinent location data present in the “location_series_data” table. This query provides the

following result:

The previously viewed location data is now correlated to the timestamp in readable format and an Outdoor/Indoor Walk. The query is effective as it uses a cross join and a where statement to isolate the data required and pull data from both pertinent tables. A cross join can be used to combine tables that have no like columns or possess values within columns that do not directly link to another table. This query does not function correctly without the where statement:

“WHERE (substr(location_series_data.timestamp,1,9)) between workout_activities.start_date and workout_activities.end_date”, the where statement used in this query, effectively states the following:

[Select values] where the location series data timestamp is between the workout activity start date and the workout activity end date.

(The substring (substr) function is used here to isolate the whole value of the timestamp and remove the decimal point and all following numbers. So “668189163.854” becomes “668189163”).

While I label this query as being inefficient in processing time, I do maintain the join and where structure used here could be effective in other means. In this use it was overall quite effective in the result number and once the result was processed a .CSV file could be exported from DB Browser for other use and additional review.

An Efficient and Effective SQL Query:

Still, even after reviewing literally every other table within the database and no common column discovered to link the two tables more efficiently, I was still bothered by the solution so far and knew there must be something missing. The solution was more complicated which hid the linking values momentarily.

But, the links were ultimately found.

There was not one but two tables containing values linking the “location_series_data” location data to the “workout_activities.activity_type” data!

Now it was a simple matter of joining like data between the “location_series_data”, “data_series”, “associations”, and “workout_activities” tables. Left outer joins do the trick:

Above, the “location_series_data.series_identifier” value was “25”. This value corresponded to the “hfd_key” value from the “data_series” table. The “data_series” table possessed a “data_id” column, the value of which corresponded to the “child_id” value of the “associations” table. The “associations” table also possessed a “parent_id” value which corresponded to the “owner_id” of the “workout_activities” table. Finally, the “workout_activities” table possessed our “activity_type” value!

Within the SQL Query, our joins are:

LEFT OUTER JOIN data_series on data_series.hfd_key = location_series_data.series_identifier

LEFT OUTER JOIN associations on associations.child_id = data_series.data_id

LEFT OUTER JOIN workout_activities on workout_activities.owner_id = associations.parent_id

Replacing the previously used cross join and where statement, we now have the following SQL Query:

The result for this query is largely the same as before:

The total result for this query was 44,590 rows of data which returned in 399ms, or 0.399 seconds. Not only was this result about 82 times faster in processing time, it returned an additional nine rows that were missed in the former query.

Analyzing Location Data Values, Perceived User Activity:

Another column within the “data_series” table is useful in analyzing location data. The “count” column contains the total number of location data points within each data series – or each workout. This value by itself is beneficial in understanding the location data yield for a set period of time but when this column is joined with other pertinent columns, we can glean insight into likely user activities during the workouts.

Because we have joined the location data to the workout activities, we now possess the corresponding workout start time, workout end time, and workout duration to review alongside our location data timestamps.

The SQL query for analysis is as follows:

The query results in 11 columns which possess a lot of information about our workouts and our location data. As shown in the query, these columns are obtained from the following data locations:

Column NameData Location
Workout Start Timeworkout_activities.start_date
Min Location Timestampmin(location_series_data.timestamp)
Workout End Timeworkout_activities.end_date
Max Location Timestampmax(location_series_data.timestamp)
Number of Location Pointsdata_series.count
Expected Number of Location Points((Workout end time – start time) * (max location timestamp – min location timestamp)) / data_series.count
Workout Typeworkout_activities.activity_type
Elapsed Time(workout_activities.end_time – workout_activities.start_time)
Workout Timeworkout_activities.duration
Location Data Capture Timespan(max(location_series_data.timestamp) – min(location_series_data.timestamp))
Location Data Capture Average (in Seconds)((max(location_series_data.timestamp) – min(location_series_data.timestamp)) / data_series.count)

Here is a data breakdown for five workouts:

A review of the actual number of location points compared to the expected number of location points, centered in the return columns, provides an indication of where we need to review further. Rows 1, 3, and 4 possess an expected number of locations points with little difference to the actual number of location points. Rows 2 and 5 both contain higher values for expected number of location points than actual location points, 531 to 181 and 1,241 to 334. These data differences indicate a need to look further into this data.

Comparing the time values for Elapsed Time (the total timespan from start to end of the workout), Workout Time, and Location Data Capture Timespan, we can see some indications of what happened.

In row 2 we can see our elapsed time and location data capture timespans are the same, 5 minutes and 10 seconds. The workout time recorded is lower, 3 minutes. This workout was paused at some point after initiation, started again, and then completed. Within the SQL query result, we can see a pause in the location data recording:

From 18:50:40 to 18:52:50, a period of 2 minutes and 10 seconds, no actual location data was recorded; however, the structure of the query is such that the discrepancy in time differences can be observed and reviewed closer.

In row 5 we can see our elapsed and workout times are the same, 20 minutes and 33 seconds. The location data capture timespan is lower, 5 minutes and 36 seconds. This workout was not initiated by the user at the beginning of the outdoor walk – the activity was recorded by the Watch and the user was prompted with the likelihood that a workout was ongoing and the option to initiate the workout. When this option was selected, the Watch displayed the outdoor walk with a starting timestamp of about 15 minutes prior. Also interesting is the fact that the location data recording did not begin until the workout option was selected by the user. This is why there is a discrepancy between the actual number of location points and the expected number of location points.

Conclusion:

iOS 16 added extremely accurate location data, now stored within the healthdb_secure.sqlite database. This article explored the data source for this location data, SQL queries to review the data, data processing inefficiency clues resulting in obtaining additional data, and the review of said data which provided insights into user actions. It is important to note that when completing SQL queries to review and export data what works initially can always be refined and improved and additional insight can be gleaned from the results.



One response to “Apple Fitness Workout Location Data: Leveraging the healthdb_secure.sqlite Database”

  1. […] Loading a custom artifact to your processing can add a single artifact or numerous artifacts depending on how the XML or python script was written. In this case, the custom artifact contains a total of six artifacts. First are iOS 15 and 16 versions of the above Workout artifact, then an Apple Fitness Workout Location Data artifact, an Apple Fitness Workout Location Data Analysis artifact, height, and weight. The majority of these artifacts are covered through this article and the new location data and analysis artifacts are covered through this article. […]

    Like

Leave a comment

About US

Metadata Forensics, LLC’s mission is to provide thorough and accurate digital investigation services. Seeking the digital truth is what we strive for in each case, and client satisfaction is one of our top priorities. Metadata Forensics, LLC has global experience in digital forensic investigations and uses the industry-leading tools in our quest for the answers our clients are seeking.

Here, we will post articles written by our Digital Forensic Examiners; highlighting original research, experiences, and other information to further the Digital Forensic Incident Response Community.

Blog at WordPress.com.