Skip to content

Audio Features & Analysis

API #

Implementation to make API calls with.

Parameters:

get_tracks_audio_features #

get_tracks_audio_features(track_id: str) -> AudioFeatures

Get audio feature information for a single track.

Parameters:

  • track_id (str) –

    The ID of the track.

Returns:

get_several_tracks_audio_features #

get_several_tracks_audio_features(track_ids: list[str]) -> list[AudioFeatures]

Get audio feature information for several tracks.

Parameters:

  • track_ids (list[str]) –

    The IDs of the tracks. Maximum: 100.

Returns:

get_tracks_audio_analysis #

get_tracks_audio_analysis(track_id: str) -> AudioAnalysis

Get a low-level audio analysis for a track in the Spotify catalog. The audio analysis describes the track's structure and musical content, including rhythm, pitch and timbre.

Parameters:

  • track_id (str) –

    The ID of the track.

Returns:

AudioFeatures #

Track audio features.

acousticness #

acousticness: float

A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.

analysis_url #

analysis_url: str

A link to the Web API endpoint providing full details of the audio analysis.

danceability #

danceability: float

Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.

duration #

duration: timedelta = Field(alias='duration_ms')

The duration of the track.

energy #

energy: float

Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale. Perceptual features contributing to this attribute include dynamic range, perceived loudness, timbre, onset rate, and general entropy.

id #

id: str

The Spotify ID for the track.

instrumentalness #

instrumentalness: float

Predicts whether a track contains no vocals. "Ooh" and "aah" sounds are treated as instrumental in this context. Rap or spoken word tracks are clearly "vocal". The closer the instrumentalness value is to 1.0, the greater likelihood the track contains no vocal content. Values above 0.5 are intended to represent instrumental tracks, but confidence is higher as the value approaches 1.0.

key #

key: int

The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.

liveness #

liveness: float

Detects the presence of an audience in the recording. Higher liveness values represent an increased probability that the track was performed live. A value above 0.8 provides strong likelihood that the track is live.

loudness #

loudness: float

The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 dB.

mode #

mode: TrackMode

Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived.

speechiness #

speechiness: float

Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value. Values above 0.66 describe tracks that are probably made entirely of spoken words. Values between 0.33 and 0.66 describe tracks that may contain both music and speech, either in sections or layered, including such cases as rap music. Values below 0.33 most likely represent music and other non-speech-like tracks.

tempo #

tempo: float

The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

time_signature #

time_signature: int

An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4" to "7/4".

track_href #

track_href: str

A link to the Web API endpoint providing full details of the track.

uri #

uri: str

The Spotify URI for the track.

valence #

valence: float

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

AudioAnalysis #

Track audio analysis.

meta #

Metadata for the analysis.

track #

Track information

bars #

The time intervals of the bars throughout the track.

beats #

The time intervals of beats throughout the track.

sections #

Sections are defined by large variations in rhythm or timbre, e.g. chorus, verse, bridge, guitar solo, etc. Each section contains its own descriptions of tempo, key, mode, time_signature, and loudness.

segments #

Each segment contains a roughly consistent sound throughout its duration.

tatums #

A tatum represents the lowest regular pulse train that a listener intuitively infers from the timing of perceived musical events (segments).

AudioAnalysisMeta #

Audio analysis metadata.

analyzer_version #

analyzer_version: str

The version of the Analyzer used to analyze the track.

platform #

platform: str

The platform used to read the track's audio data.

detailed_status #

detailed_status: str

A detailed status code for the track. If analysis data is missing, this code may explain why.

status_code #

status_code: StatusCode

The return code of the analyzer process.

timestamp #

timestamp: datetime

The time at which the track was analyzed.

analysis_time #

analysis_time: timedelta

The amount of time taken to analyze the track.

input_process #

input_process: str

The method used to read the track's audio data.

AudioAnalysisTrack #

Audio analysis track information.

num_samples #

num_samples: int

The exact number of audio samples analyzed from the track. See also analysis_sample_rate.

duration #

duration: float

Length of the track in seconds.

sample_md5 #

sample_md5: str

This field will always contain an empty string.

offset_seconds #

offset_seconds: int

An offset to the start of the region of the track that was analyzed. (As the entire track is analyzed, this should always be 0.)

window_seconds #

window_seconds: int

The length of the region of the track was analyzed, if a subset of the track was analyzed. (As the entire track is analyzed, this should always be 0.)

analysis_sample_rate #

analysis_sample_rate: int

The sample rate used to decode and analyze the track. May differ from the actual sample rate of the track available on Spotify.

analysis_channels #

analysis_channels: int

The number of channels used for analysis. If 1, all channels are summed together to mono before analysis.

end_of_fade_in #

end_of_fade_in: float

The time, in seconds, at which the track's fade-in period ends. If the track has no fade-in, this will be 0.0.

start_of_fade_out #

start_of_fade_out: float

The time, in seconds, at which the track's fade-out period starts. If the track has no fade-out, this should match the track's length.

loudness #

loudness: float

The overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track and are useful for comparing relative loudness of tracks. Loudness is the quality of a sound that is the primary psychological correlate of physical strength (amplitude). Values typically range between -60 and 0 dB.

tempo #

tempo: float

The overall estimated tempo of a track in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

tempo_confidence #

tempo_confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the tempo.

time_signature #

time_signature: int

An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4" to "7/4".

time_signature_confidence #

time_signature_confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the time_signature.

key #

key: int

The key the track is in. Integers map to pitches using standard Pitch Class notation. E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on. If no key was detected, the value is -1.

key_confidence #

key_confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the key.

mode #

mode: TrackMode

Mode indicates the modality (major or minor) of a track, the type of scale from which its melodic content is derived.

mode_confidence #

mode_confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the mode.

codestring #

codestring: str

An Echo Nest Musical Fingerprint (ENMFP) codestring for the track.

code_version #

code_version: float

A version number for the Echo Nest Musical Fingerprint format used in the codestring field.

echoprintstring #

echoprintstring: str

An EchoPrint codestring for the track.

echoprint_version #

echoprint_version: float

A version number for the EchoPrint format used in the echoprintstring field.

synchstring #

synchstring: str

A Synchstring for the track.

synch_version #

synch_version: float

A version number for the Synchstring used in the synchstring field.

rhythmstring #

rhythmstring: str

A Rhythmstring for the track. The format of this string is similar to the Synchstring.

rhythm_version #

rhythm_version: float

A version number for the Rhythmstring used in the rhythmstring field.

AudioAnalysisBar #

Audio analysis of a bar. A bar (or measure) is a segment of time defined as a given number of beats.

start #

start: timedelta

The starting point of the time interval.

duration #

duration: timedelta

The duration of the time interval.

confidence #

confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the interval.

AudioAnalysisBeat #

Audio analysis of a beat. A beat is the basic time unit of a piece of music; for example, each tick of a metronome. Beats are typically multiples of tatums.

start #

start: timedelta

The starting point of the time interval.

duration #

duration: timedelta

The duration of the time interval.

confidence #

confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the interval.

AudioAnalysisSection #

Audio analysis of a section. Sections are defined by large variations in rhythm or timbre, e.g. chorus, verse, bridge, guitar solo, etc. Each section contains its own descriptions of tempo, key, mode, time_signature, and loudness.

start #

start: timedelta

The starting point of the section.

duration #

duration: timedelta

The duration of the section.

confidence #

confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the section's "designation".

loudness #

loudness: float

The overall loudness of the section in decibels (dB). Loudness values are useful for comparing relative loudness of sections within tracks.

tempo #

tempo: float

The overall estimated tempo of the section in beats per minute (BPM). In musical terminology, tempo is the speed or pace of a given piece and derives directly from the average beat duration.

tempo_confidence #

tempo_confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the tempo. Some tracks contain tempo changes or sounds which don't contain tempo (like pure speech) which would correspond to a low value in this field.

key #

key: int

The estimated overall key of the section. The values in this field ranging from 0 to 11 mapping to pitches using standard Pitch Class notation (E.g. 0 = C, 1 = C♯/D♭, 2 = D, and so on). If no key was detected, the value is -1.

key_confidence #

key_confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the key. Songs with many key changes may correspond to low values in this field.

mode #

mode: TrackMode

Indicates the modality (major or minor) of a section, the type of scale from which its melodic content is derived. Note that the major key (e.g. C major) could more likely be confused with the minor key at 3 semitones lower (e.g. A minor) as both keys carry the same pitches.

mode_confidence #

mode_confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the mode.

time_signature #

time_signature: int

An estimated time signature. The time signature (meter) is a notational convention to specify how many beats are in each bar (or measure). The time signature ranges from 3 to 7 indicating time signatures of "3/4" to "7/4".

time_signature_confidence #

time_signature_confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the time_signature. Sections with time signature changes may correspond to low values in this field.

AudioAnalysisSegment #

Audio analysis segment.

start #

start: timedelta

The starting point of the segment.

duration #

duration: timedelta

The duration of the segment.

confidence #

confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the segmentation. Segments of the song which are difficult to logically segment (e.g: noise) may correspond to low values in this field.

loudness_start #

loudness_start: float

The onset loudness of the segment in decibels (dB). Combined with loudness_max and loudness_max_time, these components can be used to describe the "attack" of the segment.

loudness_max #

loudness_max: float

The peak loudness of the segment in decibels (dB). Combined with loudness_start and loudness_max_time, these components can be used to describe the "attack" of the segment.

loudness_max_time #

loudness_max_time: float

The segment-relative offset of the segment peak loudness in seconds. Combined with loudness_start and loudness_max, these components can be used to describe the "attack" of the segment.

loudness_end #

loudness_end: float

The offset loudness of the segment in decibels (dB). This value should be equivalent to the loudness_start of the following segment.

pitches #

pitches: list[float]

Pitch content is given by a "chroma" vector, corresponding to the 12 pitch classes C, C♯, D to B, with values ranging from 0 to 1 that describe the relative dominance of every pitch in the chromatic scale. For example a C Major chord would likely be represented by large values of C, E and G (i.e. classes 0, 4, and 7).

Vectors are normalized to 1 by their strongest dimension, therefore noisy sounds are likely represented by values that are all close to 1, while pure tones are described by one value at 1 (the pitch) and others near 0. As can be seen below, the 12 vector indices are a combination of low-power spectrum values at their respective pitch frequencies.

Image source: Spotify

timbre #

timbre: list[float]

Timbre is the quality of a musical note or sound that distinguishes different types of musical instruments, or voices. It is a complex notion also referred to as sound color, texture, or tone quality, and is derived from the shape of a segment's spectro-temporal surface, independently of pitch and loudness. The timbre feature is a vector that includes 12 unbounded values roughly centered around 0. Those values are high level abstractions of the spectral surface, ordered by degree of importance.

For completeness however, the first dimension represents the average loudness of the segment; second emphasizes brightness; third is more closely correlated to the flatness of a sound; fourth to sounds with a stronger attack; etc. See an image below representing the 12 basis functions (i.e. template segments).

Image source: Spotify

The actual timbre of the segment is best described as a linear combination of these 12 basis functions weighted by the coefficient values: timbre = \(c1 \times b1 + c2 \times b2 + \ldots + c12 \times b12\), where \(c1\) to \(c12\) represent the 12 coefficients and \(b1\) to \(b12\) the 12 basis functions as displayed below. Timbre vectors are best used in comparison with each other.

AudioAnalysisTatum #

Audio analysis tatum. A tatum represents the lowest regular pulse train that a listener intuitively infers from the timing of perceived musical events (segments).

start #

start: timedelta

The starting point of the time interval.

duration #

duration: timedelta

The duration of the time interval.

confidence #

confidence: float

The confidence, from 0.0 to 1.0, of the reliability of the interval.