What exactly IS an API? Theyโre those things that you copy and paste long strange codes into Screaming Frog for links data on a Site Crawl, right?
Iโm here to tell you thereโs so much more to them than that โ if youโre willing to take just a few little steps. But first, some basics.
Whatโs an API?
API stands for โapplication programming interfaceโ, and itโs just the way ofโฆ using a thing. Everything has an API. The web is a giant API that takes URLs as input and returns pages.
But special data services like the Moz Links API have their own set of rules. These rules vary from service to service and can be a major stumbling block for people taking the next step.
When Screaming Frog gives you the extra links columns in a crawl, itโs using the Moz Links API, but you can have this capability anywhere. For example, all that tedious manual stuff you do in spreadsheet environments can be automated from data-pull to formatting and emailing a report.
If you take this next step, you can be more efficient than your competitors, designing and delivering your own SEO services instead of relying upon, paying for, and being limited by the next proprietary product integration.
GET vs. POST
Most APIs youโll encounter use the same data transport mechanism as the web. That means thereโs a URL involved just like a website. Donโt get scared! Itโs easier than you think. In many ways, using an API is just like using a website.
As with loading web pages, the request may be in one of two places: the URL itself, or in the body of the request. The URL is called the โendpointโ and the often invisibly submitted extra part of the request is called the โpayloadโ or โdataโ. When the data is in the URL, itโs called a โquery stringโ and indicates the โGETโ method is used. You see this all the time when you search:
https://www.google.com/search?q=moz+links+api <-- GET method
When the data of the request is hidden, itโs called a โPOSTโ request. You see this when you submit a form on the web and the submitted data does not show on the URL. When you hit the back button after such a POST, browsers usually warn you against double-submits. The reason the POST method is often used is that you can fit a lot more in the request using the POST method than the GET method. URLs would get very long otherwise. The Moz Links API uses the POST method.
Making requests
A web browser is what traditionally makes requests of websites for web pages. The browser is a type of software known as a client. Clients are what make requests of services. More than just browsers can make requests. The ability to make client web requests is often built into programming languages like Python, or can be broken out as a standalone tool. The most popular tools for making requests outside a browser are curl and wget.
We are discussing Python here. Python has a built-in library called URLLIB, but itโs designed to handle so many different types of requests that itโs a bit of a pain to use. There are other libraries that are more specialized for making requests of APIs. The most popular for Python is called requests. Itโs so popular that itโs used for almost every Python API tutorial youโll find on the web. So I will use it too. This is what โhittingโ the Moz Links API looks like:
response = requests.post(endpoint, data=json_string, auth=auth_tuple)
Given that everything was set up correctly (more on that soon), this will produce the following output:
'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==', 'results': ['anchor_text': 'moz', 'external_pages': 7162, 'external_root_domains': 2026]
This is JSON data. It’s contained within the response object that was returned from the API. Itโs not on the drive or in a file. Itโs in memory. So long as itโs in memory, you can do stuff with it (often just saving it to a file).
If you wanted to grab a piece of data within such a response, you could refer to it like this:
response['results'][0]['external_pages']
This says: โGive me the first item in the results list, and then give me the external_pages value from that item.โ The result would be 7162.
NOTE: If youโre actually following along executing code, the above line wonโt work alone. Thereโs a certain amount of setup weโll do shortly, including installing the requests library and setting up a few variables. But this is the basic idea.
JSON
JSON stands for JavaScript Object Notation. Itโs a way of representing data in a way thatโs easy for humans to read and write. Itโs also easy for computers to read and write. Itโs a very common data format for APIs that has somewhat taken over the world since the older ways were too difficult for most people to use. Some people might call this part of the โrestfulโ API movement, but the much more difficult XML format is also considered โrestfulโ and everyone seems to have their own interpretation. Consequently, I find it best to just focus on JSON and how it gets in and out of Python.
Python dictionaries
I lied to you. I said that the data structure you were looking at above was JSON. Technically itโs really a Python dictionary or dict datatype object. Itโs a special kind of object in Python thatโs designed to hold key/value pairs. The keys are strings and the values can be any type of object. The keys are like the column names in a spreadsheet. The values are like the cells in the spreadsheet. In this way, you can think of a Python dict as a JSON object. For example hereโs creating a dict in Python:
my_dict = "name": "Mike", "age": 52, "city": "New York"
And here is the equivalent in JavaScript:
var my_json = "name": "Mike", "age": 52, "city": "New York"
Pretty much the same thing, right? Look closely. Key-names and string values get double-quotes. Numbers donโt. These rules apply consistently between JSON and Python dicts. So as you might imagine, itโs easy for JSON data to flow in and out of Python. This is a great gift that has made modern API-work highly accessible to the beginner through a tool that has revolutionized the field of data science and is making inroads into marketing, Jupyter Notebooks.
Flattening data
But beware! As data flows between systems, itโs not uncommon for the data to subtly change. For example, the JSON data above might be converted to a string. Strings might look exactly like JSON, but theyโre not. Theyโre just a bunch of characters. Sometimes youโll hear it called โserializingโ, or โflatteningโ. Itโs a subtle point, but worth understanding as it will help with one of the largest stumbling blocks with the Moz Links (and most JSON) APIs.
Objects have APIs
Actual JSON or dict objects have their own little APIs for accessing the data inside of them. The ability to use these JSON and dict APIs goes away when the data is flattened into a string, but it will travel between systems more easily, and when it arrives at the other end, it will be โdeserializedโ and the API will come back on the other system.
Data flowing between systems
This is the concept of portable, interoperable data. Back when it was called Electronic Data Interchange (or EDI), it was a very big deal. Then along came the web and then XML and then JSON and now itโs just a normal part of doing business.
If youโre in Python and you want to convert a dict to a flattened JSON string, you do the following:
import json my_dict = "name": "Mike", "age": 52, "city": "New York" json_string = json.dumps(my_dict)
โฆwhich would produce the following output:
'"name": "Mike", "age": 52, "city": "New York"'
This looks almost the same as the original dict, but if you look closely you can see that single-quotes are used around the entire thing. Another obvious difference is that you can line-wrap real structured data for readability without any ill effect. You can’t do it so easily with strings. Thatโs why itโs presented all on one line in the above snippet.
Such stringifying processes are done when passing data between different systems because they are not always compatible. Normal text strings on the other hand are compatible with almost everything and can be passed on web-requests with ease. Such flattened strings of JSON data are frequently referred to as the request.
Anatomy of a request
Again, hereโs the example request we made above:
response = requests.post(endpoint, data=json_string, auth=auth_tuple)
Now that you understand what the variable name json_string is telling you about its contents, you shouldnโt be surprised to see this is how we populate that variable:
data_dict = "target": "moz.com/blog", "scope": "page", "limit": 1 json_string = json.dumps(data_dict)
โฆand the contents of json_string looks like this:
'"target": "moz.com/blog", "scope": "page", "limit": 1'
This is one of my key discoveries in learning the Moz Links API. This is in common with countless other APIs out there but trips me up every time because itโs so much more convenient to work with structured dicts than flattened strings. However, most APIs expect the data to be a string for portability between systems, so we have to convert it at the last moment before the actual API-call occurs.
Pythonic loads and dumps
Now you may be wondering in that above example, what a dump is doing in the middle of the code. The json.dumps() function is called a โdumperโ because it takes a Python object and dumps it into a string. The json.loads() function is called a โloaderโ because it takes a string and loads it into a Python object.
The reason for what appear to be singular and plural options are actually binary and string options. If your data is binary, you use json.load() and json.dump(). If your data is a string, you use json.loads() and json.dumps(). The s stands for string. Leaving the s off means binary.
Donโt let anybody tell you Python is perfect. Itโs just that its rough edges are not excessively objectionable.
Assignment vs. equality
For those of you completely new to Python or programming in general, what weโre doing when we hit the API is called an assignment. The result of requests.post() is being assigned to the variable named response.
response = requests.post(endpoint, data=json_string, auth=auth_tuple)
We are using the = sign to assign the value of the right side of the equation to the variable on the left side of the equation. The variable response is now a reference to the object that was returned from the API. Assignment is different from equality. The == sign is used for equality.
# This is assignment: a = 1 # a is now equal to 1 # This is equality: a == 1 # True, but relies that the above line has been executed
The POST method
response = requests.post(endpoint, data=json_string, auth=auth_tuple)
The requests library has a function called post() that takes 3 arguments. The first argument is the URL of the endpoint. The second argument is the data to send to the endpoint. The third argument is the authentication information to send to the endpoint.
Keyword parameters and their arguments
You may notice that some of the arguments to the post() function have names. Names are set equal to values using the = sign. Hereโs how Python functions get defined. The first argument is positional both because it comes first and also because thereโs no keyword. Keyworded arguments come after position-dependent arguments. Trust me, it all makes sense after a while. We all start to think like Guido van Rossum.
def arbitrary_function(argument1, name=argument2): # do stuff
The name in the above example is called a โkeywordโ and the values that come in on those locations are called โargumentsโ. Now arguments are assigned to variable names right in the function definition, so you can refer to either argument1 or argument2 anywhere inside this function. If youโd like to learn more about the rules of Python functions, you can read about them here.
Setting up the request
Okay, so letโs let you do everything necessary for that success assured moment. Weโve been showing the basic request:
response = requests.post(endpoint, data=json_string, auth=auth_tuple)
โฆbut we havenโt shown everything that goes into it. Letโs do that now. If youโre following along and donโt have the requests library installed, you can do so with the following command from the same terminal environment from which you run Python:
pip install requests
Often times Jupyter will have the requests library installed already, but in case it doesnโt, you can install it with the following command from inside a Notebook cell:
!pip install requests
And now we can put it all together. Thereโs only a few things here that are new. The most important is how weโre taking 2 different variables and combining them into a single variable called AUTH_TUPLE. You will have to get your own ACCESSID and SECRETKEY from the Moz.com website.
The API expects these two values to be passed as a Python data structure called a tuple. A tuple is a list of values that donโt change. I find it interesting that requests.post() expects flattened strings for the data parameter, but expects a tuple for the auth parameter. I suppose it makes sense, but these are the subtle things to understand when working with APIs.
Hereโs the full code:
import json import pprint import requests # Set Constants ACCESSID = "mozscape-1234567890" # Replace with your access ID SECRETKEY = "1234567890abcdef1234567890abcdef" # Replace with your secret key AUTH_TUPLE = (ACCESSID, SECRETKEY) # Set Variables endpoint = "https://lsapi.seomoz.com/v2/anchor_text" data_dict = "target": "moz.com/blog", "scope": "page", "limit": 1 json_string = json.dumps(data_dict) # Make the Request response = requests.post(endpoint, data=json_string, auth=AUTH_TUPLE) # Print the Response pprint(response.json())
โฆwhich outputs:
'next_token': 'JYkQVg4s9ak8iRBWDiz1qTyguYswnj035nqrQ1oIbW96IGJsb2dZgGzDeAM7Rw==', 'results': ['anchor_text': 'moz', 'external_pages': 7162, 'external_root_domains': 2026]
Using all upper case for the AUTH_TUPLE variable is a convention many use in Python to indicate that the variable is a constant. Itโs not a requirement, but itโs a good idea to follow conventions when you can.
You may notice that I didnโt use all uppercase for the endpoint variable. Thatโs because the anchor_text endpoint is not a constant. There are a number of different endpoints that can take its place depending on what sort of lookup we wanted to do. The choices are:
-
anchor_text
-
final_redirect
-
global_top_pages
-
global_top_root_domains
-
index_metadata
-
link_intersect
-
link_status
-
linking_root_domains
-
links
-
top_pages
-
url_metrics
-
usage_data
And that leads into the Jupyter Notebook that I prepared on this topic located here on Github. With this Notebook you can extend the example I gave here to any of the 12 available endpoints to create a variety of useful deliverables, which will be the subject of articles to follow.