Classifying Tweets: API V2, OpenAI API, and Python

In this article, we will develop an application for Classifying Tweets obtained from the Twitter V2 API using the OpenAI classification API. We will merge two things:

OpenAI API for AI functions to do the classification.
The Twitter API V2 to search for the tweets we are going to classify.

This lesson includes a YouTube video that walks you through each step as well as a GitHub repo with the code.

GitHub Starting Code Repo: Skolo Online/twitter-V2-api

GitHub Finishing Code Repo: Skolo Online/twitter-search-API

YouTube Tutorial Video:

The following are the steps we will take to complete classifying tweets with OpenAI API:

Download the starting code — the Flask Application
Create AI training document
Upload training document to OpenAI
Classify tweets
Update the Flask application with classification functionality

Download the starting code — the Flask Application

Clone the repo:

git clone https://github.com/skolo-online/twitter-v2-api.git

Fill in your API keys in the config file, install the appropriate pip libraries, and start the app. Confirm you can see the app from the Twitter lesson. Everything else is built on it, and you’ll need it to continue with the lesson.

Create AI training document

To fulfill the purpose for which they were designed, all AI models must be trained. Although OpenAI gives models and some training, some tasks are too specific and require targeted training.

Because each use case is unique, classifying tweets is one of those duties that necessitated specialized training. In this example, we will categorize Tweets into three groups:

Positive
Negative
Neutral

As a result, we must build a file containing test tweets that have been classified into the groups mentioned in order to teach/train the AI how to classify the tweets. So we’ll show you what a positive tweet appears like, as well as what a bad and neutral tweet looks like.

The more variation we provide in test dataset, the better the model will be in classifying the outcomes. But, in the case of tweets, the training data must be similar to the real tweets that will be categorized.

As a result, we will manually classify the Tweets after retrieving them from Twitter using our API V2 Tweepy code.

import tweepy
import config
import json

def getClient():
    client = tweepy.Client(bearer_token=config.BEARER_TOKEN,
                           consumer_key=config.API_KEY,
                           consumer_secret=config.API_KEY_SECRET,
                           access_token=config.ACCESS_TOKEN,
                           access_token_secret=config.ACCESS_TOKEN_SECRET)
    return client
  
def searchTweets(client, query, max_results):

    tweets = client.search_recent_tweets(query=query, max_results=max_results)

    tweet_data =  tweets.data
    results = []

    if not tweet_data is None and len(tweet_data) > 0:
        for tweet in tweet_data:
            obj = {}
            obj['id'] = tweet.id
            obj['text'] = tweet.text
            results.append(obj)

    return results

def writeJsonLinesFile(list):
    filename = 'tweet-training.jsonl'
    with open(filename, 'w') as f:
        for line in list:
            f.write(json.dumps(line))
            f.write('\n')

def return50Tweets(query):
    query = '{} lang:en -is:retweet'.format(query)

    client = getClient()
    tweets = searchTweets(client, query, 50)

    objs = []

    if len(tweets) > 0:
        for tweet in tweets:
            obj = {}
            obj['text'] = tweet['text']
            objs.append(obj)

    writeJsonLinesFile(objs)

If you run this function:

return50Tweets(query)

At the end of the file, enter a query to retrieve a list of tweets saved in a JSONL file in the root of your project. Each line of the file will be formatted as follows:

{"text": "@Tbose01 I\u2019m attending a funeral in Polokwane \ud83d\ude01"}

Add a label to each line of this file as follows:

{"text": "@Tbose01 I\u2019m attending a funeral in Polokwane \ud83d\ude01", "label": "Negative"}

Upload a training file to OpenAI for Classifying Tasks.

After you’ve finished creating the document, you’ll need to upload it to OpenAI. This process’s documentation can be found here: OpenAI API

This will be accomplished using the following code:

import os
import openai
import config
openai.api_key = config.OPENAI_API_KEY
def uploadClassificationDocument():
    filename = 'tweet-training.jsonl'
    response = openai.File.create(file=open(filename), purpose="classifications")
    return response

Run the code:

def uploadClassificationDocument()

Save the response, which should look like this:

# {
#   "bytes": 9401,
#   "created_at": 111111111111,
#   "filename": "tweet-training.jsonl",
#   "id": "file-1234567890123456",
#   "object": "file",
#   "purpose": "classifications",
#   "status": "uploaded",
#   "status_details": null
# }

Classify Tweets

Keep a record of the “id” — you’ll need it for the second portion of the code:

class_file = "file-1234567890123456"

def classifyTweet(query):
    response = openai.Classification.create(
            file=class_file,
            query=query,
            search_model="ada",
            model="curie",
            max_examples=3)

return response['label']

Update the Flask Application with Classifying Tweets

The final Flask application should look like this:

The final stage is to incorporate shown above code with our current Flask app so that a person can input a search term from the application’s front-end and view the results – categorized into the precise buckets pre-defined in the training file.

Find the detailed description in the YouTube video embedded at the beginning of this guide.

The Flask App.py Update

Within the app.py file, we must add a new route for the second page as follows:

@app.route('/classify', methods=["GET", "POST"])
def classify():

    query = 'Polokwane lang:en -is:retweet'

    if request.method == 'POST':
        query = '{} lang:en -is:retweet'.format(request.form['query'])

    max_results = 20

    tweets = search.retrieveClasiffyTweets(query, max_results)

    positiveTweets = []
    negativeTweets = []
    neutralTweets = []

    for tweet in tweets:
        if tweet['classx'] == 'Positive':
            positiveTweets.append(tweet)
        if tweet['classx'] == 'Negative':
            positiveTweets.append(tweet)
        if tweet['classx'] == 'Neutral':
            positiveTweets.append(tweet)

    return render_template('class.html', **locals())

This route will function in the same way as the first one on the home page; when you first access the route, the following query will be executed:

query = 'Polokwane lang:en -is:retweet'

The maximum number of results that can be retrieved is 20. Then adjust this number between 10 and 100.

The tweets will be categorized into three Python lists. Which will be made available to the frontend.

positiveTweets = []
negativeTweets = []
neutralTweets = []

The Python Flask HTML Template for Classification

The front-end template (class.html) will look like so:

<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <meta name="description" content="">
    <meta name="author" content="Skolo Online Learning">

    <title>Skolo</title>
    <link rel="shortcut icon" type="image/x-icon" href="{{ url_for('static', filename='images/favicon.png') }}">

    <!-- Bootstrap CSS -->
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.1.3/dist/css/bootstrap.min.css" rel="stylesheet" integrity="sha384-1BmE4kWBq78iYhFldvKuhfTAU6auU8tT94WrHftjDbrCEXSU1oBoqyl2QvZ6jIW3" crossorigin="anonymous">

    <style>
      .bd-placeholder-img {
        font-size: 1.125rem;
        text-anchor: middle;
        -webkit-user-select: none;
        -moz-user-select: none;
        user-select: none;
      }

      @media (min-width: 768px) {
        .bd-placeholder-img-lg {
          font-size: 3.5rem;
        }
      }
    </style>

    <!-- Custom styles for this template -->
    <link href="navbar-top.css" rel="stylesheet">
  </head>
  <body>

<nav class="navbar navbar-expand-md navbar-dark bg-dark mb-4">
  <div class="container-fluid">
    <a class="navbar-brand" href="/">Skolo Online Learning</a>
    <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" aria-expanded="false" aria-label="Toggle navigation">
      <span class="navbar-toggler-icon"></span>
    </button>
    <div class="collapse navbar-collapse" id="navbarCollapse">
      <ul class="navbar-nav me-auto mb-2 mb-md-0">
        <li class="nav-item">
          <a class="nav-link"  href="/">Search Tweets</a>
        </li>
        <li class="nav-item">
          <a class="nav-link active" aria-current="page" href="/classify">Classify Tweets</a>
        </li>
      </ul>
    </div>
  </div>
</nav>

<main class="container">

  <div class="container mt-5 mb-5">
    <br>
    <h1 class="mb-3">Your Tweets Classified</h1>
    <form class="" action="/classify" method="post">
      <div class="mb-3">
        <label for="query" class="form-label">Enter the Search Query for Twitter</label>
        <input type="text" class="form-control" id="query" name="query" placeholder="Search for .......">
      </div>
      <button type="submit" class="btn btn-primary"> RUN SEARCH QUERY </button>
    </form>
    <br>
    <ul class="nav nav-tabs" id="myTab" role="tablist">
    <li class="nav-item" role="presentation">
      <button class="nav-link active" id="positive-tab" data-bs-toggle="tab" data-bs-target="#positive" type="button" role="tab" aria-controls="positive" aria-selected="true">positive</button>
    </li>
    <li class="nav-item" role="presentation">
      <button class="nav-link" id="negative-tab" data-bs-toggle="tab" data-bs-target="#negative" type="button" role="tab" aria-controls="negative" aria-selected="false">negative</button>
    </li>
    <li class="nav-item" role="presentation">
      <button class="nav-link" id="neutral-tab" data-bs-toggle="tab" data-bs-target="#neutral" type="button" role="tab" aria-controls="neutral" aria-selected="false">neutral</button>
    </li>
  </ul>
  <div class="tab-content" id="myTabContent">
    <div class="tab-pane fade show active" id="positive" role="tabpanel" aria-labelledby="positive-tab">
      <br>
      <div class="card-group mt-5">
        <div class="row">
          {% for tweet in positiveTweets %}
          <!-- Card starts here -->
          <div class="col-lg-4 mb-3">
            <div class="card">
              <div class="card-body">
                <h5 class="card-title">Username: {{tweet.username}}</h5>
                <p class="card-text">{{tweet.text}}</p>
              </div>
              <div class="card-footer">
               <div class="row">
                 <div class="col-lg-6">
                   <a href="{{tweet.url}}"><button class="btn btn-primary btn-block">VIEW TWEET</button></a>
                 </div>
                 <div class="col-lg-6">
                   <a href="/retweet/{{tweet.id}}/"><button class="btn btn-success btn-block">RETWEET</button></a>
                 </div>
               </div>
              </div>
            </div>
          </div>
          <!-- Card ends here -->
          {% endfor %}
        </div>
      </div>
    </div>

    <div class="tab-pane fade" id="negative" role="tabpanel" aria-labelledby="negative-tab">
      <div class="card-group mt-5">

Tutorial Comments

Please post any comments, observations, or recommendations about this technique to Tweet classification in the comments section.

Check out this article on Twitter Programming