Monitors Knowledge Graphs by taking a list of brand searches, queries Google, checks for the Knowledge Graph (KG) and records the image. It then takes the previous day's results and compares the images; results that have changed are flagged with a '1' in the 'data' tab and the records are placed in a sheet called 'image_change_tracking'.
The script is intended to be ran everyday. This can be accomplished by running it local manually (Ewww), setting up a batch file (windows) + task schedule to run automagically, or adding proxies to the get_serp(url) function's use of Selenium and throwing it on an EC2 instance + cronjob.
Create your virtual using the environment.yml file associated with the repo. It makes use of Gsheets API via the gspread library as well as Selenium + BeautifulSoup to get the Google SERP and pandas to handle/compare data. After your enivonrment is setup, you'll need to get serviceaccount credentials through the Google's Developer Console saving the credentials as client_secret.json in the script's directory. You'll also need chromedriver.exe in the scripts path, which you can get from here
Once you've setup the script to work, create a new gsheet (example) with the following tabs:
- data (stores all historical data)
- 1 row + 5 columns, With these headers:
Business Name,google_query,kg_image_url,timestamp,change_detected
- 1 row + 5 columns, With these headers:
- image_change_tracking (store only records where images changed)
- 1 row + 5 columns, with these headers:
Business Name,date_discovered,google_query,new_kg_image_url,old_kg_image_url
- 1 row + 5 columns, with these headers:
- brands_to_query (the brands to query)
- 1 column with this header:
Business Name
- 1 column with this header:
...update the gsheet_workbook_name variable to your sheet's name and invite your serviceaccount with edit privileges (its address will be something like: [email protected]).
This thing was written originally 2 years ago, it was an adventure figuring out what everything did a couple years removed (thank god for comments) and there's some cringe worthy code here; I'll improve it overtime (like removing terrible itterators eg- range(len(df))). If you have problems just hit me up on Twitter and/or fork it.