In this tutorial I am going to explain how OAuth 2.0 works and how to apply it for interacting with Google Analytics API using Python. Google provides for that purpose a Python package – which so far only supports Python 2 though … well.
OAuth2 seems to be quite a mess at first and Google’s documentation on this subject is not that well organized in my opinion. So with this article I do my best to save you the sweat I had to invest. After all it’s not that complicated anyway, as you will probably agree.
Getting Started – Registering the Client Application
- Install google-api-python-client
- Download the example Python script (console.py) from GitHub and store it in folder X
- Visit Google Developers Console
- Create project
- APIs & auth > APIs – activate Analytics API
- APIs & auth > Credentials – create new Client ID (Installed Application, default settings)
- APIs & auth > Consent screen – configure “Product Name” and “E-Mail adress”
- APIs & auth > Credentials – Download JSON and store as “client_secrets.json” in folder X. The secrets file contains your application’s ID and a secret token to authenticate it.
- Log in to a Google account which can access your Google Analytics account
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
{ "installed": { "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "client_email": "", "client_id": "90...20-3l...li.apps.googleusercontent.com", "client_secret": "VO...il", "client_x509_cert_url": "", "redirect_uris": [ "urn:ietf:wg:oauth:2.0:oob", "oob" ], "token_uri": "https://accounts.google.com/o/oauth2/token" } } |
Flowing through OAuth 2.0 Step by Step
The numbers I use here refer to the steps accordingly numbered in above process chart.
(1) Run console.py
1 |
.../X/> python2.7 console.py 123456 2014-01-01 2014-12-31 users |
This will query GA for the number of users from January 1 2014 until December 31 2014 for the profile with ID 123456. If you replace 123456 with 0, then the script will find out the first available profile ID of your GA account and use that.
(2) The (so called) “Flow” begins with opening a browser and redirecting it to the consent page via the authorization URL. In this case it would be:
1 2 3 4 5 6 7 8 |
# URL escapings replaced with respective characters https://accounts.google.com/o/oauth2/auth? scope = https://www.googleapis.com/auth/analytics.readonly &redirect_uri = urn:ietf:wg:oauth:2.0:oob &response_type = code &client_id = 90[...]20-3l[...]ili.apps.googleusercontent.com &access_type = offline |
The different components are documented here.
- scope : read permissions for Google Analytics
- redirect_url : authorization code via copy/paste (instead of via HTTP request)
- response_type : for now we want the authorization code
- client_id : for identifying the application
- access_type : eventually we want not just an access token (“online”) but also a refresh token (“offline”) – think: if we have the refresh token we can renew the access token, even when the user is offline.
(2a) The user is shown a consent page where s/he can confirm the authorization request.
(2b) If the user refuses – that’s that – if not then …
(3) … the authorization server now communicates the authorization code to the application. How this is done depends on the setup. In case of a web application the authorization would simply again redirect the user’s browser – this time to the redirect URI with the authorization code attached to the request. In our case this is not possible b/c we are working with a simple Python script which does not feature a web server. That’s the reason for the weird redirect URI (urn:ietf:wg:oauth:2.0:oob) specified above – it causes the authorization server to simply send the authorization code to the user’s browser so (3′) the user can copy it and then paste it to the console where the script is already waiting.
(4) With the authorization code at hand the application can now request an access token which is required so it can finally query the Google Analytics API. Actually (due to access_type being set to “offline” in the authorization URL above) we not just (5) get an access token – but also a refresh token, so our application can automatically rerequest a new access token, after it turned invalid, without requiring user interaction.
Steps (2) to (5) are covered by:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
def acquire_oauth2_credentials(secrets_file): """Flows through OAuth 2.0 authorization process for credentials.""" flow = client.flow_from_clientsecrets( secrets_file, scope='https://www.googleapis.com/auth/analytics.readonly', redirect_uri='urn:ietf:wg:oauth:2.0:oob') # (2) auth_uri = flow.step1_get_authorize_url() webbrowser.open(auth_uri) # (3') auth_code = raw_input('Enter the authentication code: ') # (4),(5) credentials = flow.step2_exchange(auth_code) return credentials |
Querying Google Analytics
The credentials object resulting from the authorization dance is then used to extend an HTTP object which again is used for creating an API specific service object.
1 2 3 4 5 6 |
def create_service_object(credentials): """Creates Service object for credentials.""" http_auth = httplib2.Http() http_auth = credentials.authorize(http_auth) service = discovery.build('analytics', 'v3', http_auth) return service |
The API call is then built on top of the service object by chaining methods. A succinct interactive documentation of possible API calls is provided by Google’s API Explorer. The first item analytics.data.ga.get corresponds to our call here.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
def query_analytics(service, profile_id, start_date, end_date, metric): """Performes simple query for profile.""" if profile_id == None or profile_id == "0": profile_id = get_first_profile_id(service) result = service.data().ga().get( ids = 'ga:' + profile_id, start_date = start_date, end_date = end_date, metrics = 'ga:' + metric ).execute() return result |
The result of the query will be encapsulated in a JSON object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
{ "columnHeaders": [ { "columnType": "METRIC", "dataType": "INTEGER", "name": "ga:users" } ], "containsSampledData": false, "id": "https://www.googleapis.com/analytics/v3/data/ga ?ids=ga:123456&metrics=ga:users&start-date=2014-01-01 &end-date=2014-12-31", "itemsPerPage": 1000, "kind": "analytics#gaData", "profileInfo": { "accountId": "654321", "internalWebPropertyId": "64749522", "profileId": "123456", "profileName": "All Web Site Data", "tableId": "ga:123456", "webPropertyId": "UA-654321-1" }, "query": { "end-date": "2014-12-31", "ids": "ga:123456", "max-results": 1000, "metrics": [ "ga:users" ], "start-date": "2014-01-01", "start-index": 1 }, "rows": [ [ "61656" ] ], "selfLink": "https://www.googleapis.com/analytics/v3/data/ga ?ids=ga:123456&metrics=ga:users &start-date=2014-01-01 &end-date=2014-12-31", "totalResults": 1, "totalsForAllResults": { "ga:users": "61656" } } |
Storing and Reusing Credentials
The credentials – that is the access token and the refresh token – are going to be valid beyond the session during which they were acquired. Hence it makes sense to store and reuse them. For that purpose a Storage class is provided convenient persistency. I nonetheless chose here to use serialization with JSON as this method is more versatilely applicable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
def read_credentials(fname): """Reads JSON with credentials from file.""" if os.path.isfile(fname): f = open(fname, "r") credentials = client.OAuth2Credentials.from_json(f.read()) f.close() else: credentials = None return credentials def write_credentials(fname, credentials): """Writes credentials as JSON to file.""" f = file(fname, "w") f.write(credentials.to_json()) f.close() |
A serialized credentials object looks as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
{ "_class": "OAuth2Credentials", "_module": "oauth2client.client", "access_token": "ya29.GAH...ZsrXIGg", "client_id": "90...20-3l...li.apps.googleusercontent.com", "client_secret": "VO...il", "id_token": null, "invalid": false, "refresh_token": "1/03wB...TK_SUI", "revoke_uri": "https://accounts.google.com/o/oauth2/revoke", "token_expiry": "2015-02-12T16:09:13Z", "token_response": { "access_token": "ya29.GAH...ZsrXIGg", "expires_in": 3600, "refresh_token": "1/03wB...TK_SUI", "token_type": "Bearer" }, "token_uri": "https://accounts.google.com/o/oauth2/token", "user_agent": null } |
(original article published on www.joyofdata.de)
Thanks for the article! That graph helps a lot to understand the process.
Assuming the data is public (no need to authenticate, like a tweet), why do I still need the autorization server? Can the authentication be skipped in that case since the data I’m trying to access is public?
Pingback: Another OAuth 2 explanation for Python access to Google Analytics | small means Big
Great writeup. I agree with you, much sweat equity to get this setup, but once it’s up it’s pretty simple to understand. For some of your readers, if they are looking to move data in/out of GA through a scheduled process, they may want to consider switching to using a Service Account. This prevents having to have the consent screen to authorize access. The Service account setup also required extensive mind headaches, and the documentation on Google is lacking. I documented the process to get that up and running here (https://smallmeansbig.wordpress.com/2014/11/23/how-to-call-a-google-api-using-a-service-account-part-1-of-2/). The example is to use Python to use a Service Account the Content API for Shopping, but the concept is the same.
Hey David, that’s a good hint from you regarding usage of a service account in case of automated regular data loading. Kudos for your write-up!