Automating LinkedIn Keyword Analysis with Selenium and Python

Automating LinkedIn Keyword Analysis with Selenium and Python

Web scraping and sentiment analysis

If your job or project involves scraping the web then there's one tool you may want to add to your toolkit if you don't know about it already— Selenium

Here's what we will discuss in this article

  1. Introduction

  2. Setting up Selenium and the Chrome driver

  3. Logging in to LinkedIn

  4. Searching for keywords on LinkedIn

  5. Loading more content by scrolling down the page

  6. Scraping the post content and calculating sentiment score

  7. Printing the results

  8. Handling exceptions and closing the browser

  9. Time Module

  10. Conclusion

Web scraping has become a popular way of gathering data for analysis and research purposes. In this article, we will be discussing how to write a Python script to scrape LinkedIn posts using the Selenium and TextBlob libraries.

Prerequisites:

Before we proceed, we need to have the following prerequisites:

Basic knowledge of Python programming language.

Installed Python 3 on your local machine

Installed Selenium and TextBlob libraries in your Python environment

Installed Chrome web browser

Installed ChromeDriver executable file for your Chrome browser version

Step 1: Import Libraries

The first step is to import the necessary libraries into our Python script. In this case, we need to import the following libraries:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
import random
from textblob import TextBlob

The selenium library is used for web automation, time and random libraries are used for introducing time delays to avoid overloading the server with requests, and the TextBlob library is used for sentiment analysis.

Step 2: Initialize the Chrome Driver

Next, we need to initialize the Chrome driver with the Options() method. We also need to specify the --headless argument to run the browser in the background.

options = Options()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)

Step 3: Navigate to LinkedIn Website and Login

We now need to navigate to the LinkedIn website and log in with our email and password. We can achieve this by using the get() method to navigate to the website and the find_element() method to find the email and password input fields.

driver.get('https://www.linkedin.com/')
time.sleep(random.randint(3, 5))
driver.find_element(By.ID, 'session_key').send_keys(
        'YOUR EMAIL')
time.sleep(random.randint(1, 3))
driver.find_element(By.ID, 'session_password').send_keys('PASSWORD')
time.sleep(random.randint(1, 3))
driver.find_element(By.CLASS_NAME, 'sign-in-form__submit-button').click()
time.sleep(random.randint(3, 5))

Step 4: Define Keywords to Search For

We now need to define a list of keywords to search for on LinkedIn. In this case, we will be searching for posts related to Lionel Messi, Cristiano Ronaldo, Neymar, and Lebron James.

keywords = ["Lionel Messi", "Cristiano Ronaldo", "Neymar", "Lebron James"]

Step 5: Navigate to LinkedIn Search Page and Search for Keywords

We need to navigate to the LinkedIn search page and search for the keywords. We can achieve this by using the get() method to navigate to the search page and passing the keywords as a string with 'OR' operator between them.

driver.get('https://www.linkedin.com/search/results/content/?keywords=' +
               ' OR '.join(keywords))
time.sleep(random.randint(3, 5))

Step 6: Scroll Down to Load More Content

We now need to scroll down the page multiple times to load more content. We can achieve this by using a for loop and the execute_script() method to scroll to the bottom of the page.

for i in range(50):
        driver.execute_script(
            'window.scrollTo(0, document.body.scrollHeight);')
        time.sleep(random.randint(2, 4))

Step 7: Scraping the post content and calculating sentiment score

Once the posts have been loaded, the next step is to scrape the content of each post and count the number of times each keyword appears in the post. We can use the find_elements() method to find all post elements and loop through them to get their text content. We can then use the count() method to count the number of times each keyword appears in the post text.

In addition to counting the number of times each keyword appears, we can also calculate the sentiment score of each post that mentions a keyword using the TextBlob library. The sentiment score ranges from -1 to 1, where -1 indicates negative sentiment, 0 indicates neutral sentiment, and 1 indicates positive sentiment. We can calculate the sentiment score for each post by calling the sentiment.polarity method of the TextBlob object.

count = {keyword: 0 for keyword in keywords}
    sentiment_score = {keyword: 0 for keyword in keywords}
    post_elements = driver.find_elements(
        By.CLASS_NAME, 'feed-shared-update-v2__description-wrapper')
    for post in post_elements:
        post_text = post.text
        for keyword in keywords:
            count[keyword] += post_text.count(keyword)
            if keyword.lower() in post_text.lower():
                post_sentiment = TextBlob(post_text).sentiment.polarity
                sentiment_score[keyword] += post_sentiment
        time.sleep(random.randint(1, 3))

     for keyword, keyword_count in count.items():
        if keyword_count > 0:
            sentiment_score[keyword] /= keyword_count

Step 8: Printing the results

Finally, we can print the count and sentiment score for each keyword. If a keyword was not mentioned in any of the posts, we print a message indicating that. To calculate the sentiment score for each keyword, we divide the total sentiment score by the number of times the keyword appears in the posts. We can then print the count and sentiment score for each keyword using the print() function.

print(
                f"{keyword} was mentioned {keyword_count} times with a sentiment score of {sentiment_score[keyword]:.2f}.")
        else:
            print(f"{keyword} was not mentioned in any of the posts.")

Step 9: Handling exceptions and closing the browser

We also need to handle exceptions that may occur during the execution of the script. For example, if the website is down or if there is a network issue, an exception may be raised. The entire code is wrapped with a try-except block to catch any exceptions that occur and print an error message.

Finally, we need to close the browser after the script has finished executing. We can do this by calling the quit() method of the driver object.

driver.quit()

Now let's talk about why the 'time' module is very important.

The time module is a built-in Python module that provides various functions to work with time-related tasks. In the context of the LinkedIn keyword analysis script, the time module is used to introduce delays between different actions performed by the Selenium WebDriver. The reason for introducing delays is to simulate human-like behavior and avoid overwhelming LinkedIn's servers with too many requests in a short amount of time.

Introducing Random Delays

The script uses the random module to generate random integer values that are used as the length of delays. This technique of using random delays is called "fuzzing" and helps to avoid being detected as a bot by LinkedIn's security measures.

Waiting for Page Load

The script also uses time.sleep() to wait for the page to load after performing an action. For example, after entering the login credentials and clicking the sign-in button, the script waits for the page to load before proceeding with the next action.

We also used time.sleep() to wait for the content to load after scrolling down the page. The driver.execute_script() function is used to scroll down the page to load more content, and then the script waits for a random interval before proceeding to the next action.

By introducing these random delays, the script behaves more like a human user and avoids triggering LinkedIn's security measures that could block the script.

Conclusion

In this article, we have discussed how to use Python and Selenium to scrape data from LinkedIn. We have gone through each step of the process, from setting up the Selenium driver to scraping the post content and calculating sentiment scores. We have also discussed how to handle exceptions and close the browser after the script has finished executing. With this knowledge, you can now use Python and Selenium to scrape data from LinkedIn and other websites for your own projects.