NEW Search Engine Stuff: CBP & USAO Press Releases and Fast Access to CBP Tweeted Images
Every day three (3) tasks require overnight attention
- Update the Search Engine results (DAILY: DL 15m/Process 1+hr)
- Scrape 70+ CBP Twitter accounts for text, pictures and videos (DAILY: DL 1+hr/Process 2+hr)
- Download CBP Weekday Press releases (WEEKDAYS: DL 15m/Process 1-3hr)
This does not include maintenance of in-house software used, random tweets added, and 8-12 hours of videos and podcast playing in the background.
Four (4) NEW THINGS
- Soon we will restart processing USAO (United States Attorneys' Offices) press releases. Details as the process restarts.
- Soon our search engine website will have a "dedicated" webpage for links to CBP press releases. Details as the process rolls into place.
- In a few days, we will describe - in detail - how our search engine works and how you can help.
- On the near horizon is access to CBP Tweeted Images. Details below
TL;DR (Too Long; Didn't Read)
Daily, #BorderObserver will scrap 70+ CBP twitter accounts for the text, images and videos, and make those easily accessible from webpages on our search engine website. Those webpages will include categories, hashtags & timeline threads.
A LITTLE DETAIL
For sometime now, when CBP does a press release the images are not always great. So what we have had to do is search for a CBP tweet that very likely has a better quality image. This could take from 2 to 20 minutes to find an image. When this fails, then the fallback plan is to backfill with a stock image and/or a map.
Just after midnight EST, 70+ CBP related accounts will be scrapped for the tweets of the previous day; including any pictures and/or videos. Then those tweets will be condensed on to a single webpage on our search engine website. You will then be able to see that previous day's tweets on one webpage. In addition, if you click on a picture or video you will be taken to our Flicker repository where you can download the picture or video.
Tweets and images of interest will be promoted on the search engine results page. A separate webpage will have the daily scraping of CBP tweets.
MORE DETAILS
There are some side effects to this process.
- If CBP or Texas DPS tweet about an effective stop, it is possible that that effective stop can be timeline-traced through to a USAO or local DA press release - which might take 6 months to 3+ years (to sentencing, if found guilty).
- CBP and CBP-related tweeted images, pictures and videos will be part of an ever growing organized repository of STOCK IMAGES.
- If a CBP account tweets about a press release, that tweet will be directly pointed to on the webpage.
- Sometime in the near future, tweeted image related to a press release will be made directly available on that webpage.
A QUICK WORD ON THE SEARCH ENGINE
A few months back when launching the press-release-database, it became painfully obvious that we had missed the mark. An obvious take away was that a majority of people are STILL USING the current "search engines" (Google, Bing, DuckDuckGo).
However, the real stories are often at the front lines. In Texas & New Mexico, it is Brownsville, Laredo, Eagle Pass, and El Paso. In Arizona, it is Douglas, Nogales, and Yuma. In California, it is Calexico, Tecate, and San Diego.
Nationally, the "search engine" uses - to name a few:
breitbart.com
dailysignal.com
foxnews.com
freebeacon.com
newsmax.com
newsnationnow.com
nypost.com
thegatewaypundit.com
thepostmillennial.com
washingtontimes.com
AT the bottom of the results page for the search engine is the complete list of websites we check. In addition it contains:
- Keywords List - Words we search for in the Title
- Fetch List - A complete list of news outlets we check - in alphabetic order.
- Display List - A list of news outlets we check - in display order. The order we use on the front page.
SEARCH ENGINE RESULTS PAGE
https://dailybordernews.github.io/
Comments
Post a Comment