Detourning the Web

Week 12: America Dumps

During my Vintage Fountains project, I enjoyed the anticipation and reveal of each new fountain. Sure I could visit the image results page of any search engine, but with that project I found myself spending time with each individual picture, considering the life of the object(s) pictured and the photographer’s decisions. I enjoyed the extremely slowed-down, one image-at-a-time pace. But that was on Twitter, and right now Instagram rules the photo sharing scene.

I’ve been on Instagram for two weeks now learning how to scrape images from public hashtag pages, only to remix and throw them right back from whence they came (see @autoechoes). In the process of playing, I observed content from some of more popular hashtags. There’s a tag for nearly everything and plenty of skin, faces, food, and camera beautiful landscapes and lifestyles. I found the quantity of posts astonishing: at the time of this writing, over 341 million in #selfie, 435 million in #happy, 520 million in #photooftheday, 746 million in #instagood, and 1.2 billion in #love. It’s a positive place, this Instagramland. (By comparison only 870,000 in #unhappy and 24 million posts in #sad.) I found the likes and followers an alluring distraction (apparently for some the temptation is too great). I asked friends and colleagues about their Insta experiences. Many shared pics to connect with friends and family and/or to participate in threads related to interests and hobbies. Some commented on self-branders and corporate marketing strategies.

After a week it all started to look about the same (smiley, centered, saturated, squared), and I started to wonder about what I was not seeing. If so many people are using this platform (are you up to one billion, yet, Instagram?), then could it be used to bright to light places far removed from folks' like-radars? Places rarely sought out in real life, much less shared online for followers. Like landfills, for example. Waste of all kinds is universal. Humans have been burying (and sometimes building on top of) their trash for thousands of years. It’s one of the hallmarks of civilization. Why don’t we discuss it more, specifically about how it allows society to function…or in the emergence of Anthropocence, maybe eventually not so well? Is there an unsustainable cost to coveted #lifestyles?

Launched in honor of Earth Day, @americandumps posts satellite views of some 2,450 solid waste landfills in the United States. Included with each image is its state, latitude and longitude, whether it's open or closed, and the amount of waste in place* in tons. All data was sourced from Google Maps and the February 2018 Data Files from the Landfill Methane Outreach Program, a voluntary EPA program. LMOP “works cooperatively with industry stakeholders and waste officials to reduce or avoid methane emissions from landfills” by “[encouraging] the recovery and beneficial use of biogas generated from organic municipal solid waste. According to their database, the total amount of tonnage for sites in which that data is available, is currently over 11 billion tons of trash.

My project uses two scripts: one to retrieve the satellite image of each site and the other to upload it to Instagram. A bit about my process (all code linked below): 

  1. After retrieving the LMOP data, I added my own ID field, changed state abbreviations to full names, removed spaces from those full names to prep them for the hashtags, and duplicated the latitude and longitude columns, inserting “Data Missing” into the empty fields (also for Instagram caption display). Afterwards, I formatted the file as CSV and then converted it to JSON
  2. Next, I wrote get_images.py to iterate through each landfill record in the JSON file and call the Google API with the latitude and longitude coordinates of each site. 
  3. With that working, I downloaded all of the images at two different zoom levels, 15 and 16, to compare. Though I prefer the detail at zoom 16, sites are less likely to get cropped at 15. In addition, 15 provides greater context, showing how each landfill is situated within the landscape and its size compared to any surrounding community.** 
  4. Then, I built upload_images.py to retrieve each satellite image and post it to Instagram along with corresponding information. Hashtags were chosen because of their relevance and popularity: combined their total posts sum to over one billion. 

Of note, I came across this error early during upload testing:

Request return 400 error!
{u'status': u'fail', u'message': u"Uploaded image isn't in the right format"}

Turns out that Instagram refused photos straight out of Google Maps. Somehow it occurred to me to try opening and saving the images as new files using the Pillow library in get_images.py, and that did the trick.

*This report defines waste in place “as all waste that was landfilled in the thirty-year period before the inventory year, calculated as a function of population and per capita waste generation.” 
**Unfortunately I forgot to switch the zoom back to 15 until 157 images were posted.

@americandumps
Code on Github

Week 9: Vintage Fountains

Learning how to automate Twitter status updates this week inspired me to do so with images. Still a reserved social media user, I had an unused email account lying around that presented the perfect opportunity to practice my scraping skills and poke my head into the Twitterverse. 

To some the background story is familiar: in the spring of 1917, Marchel Duchamp anonymously submitted an artwork titled, Fountain, signed R. Mutt, to the inaugural exhibition of the Society of Independent Artists, of which he himself was a board member. According to show rules, all submissions would show, and all were, except for Fountain, which was deemed not art by the exhibition committee on the account of it being a urinal. This was not Duchamp’s first readymade but perhaps one of his more well known pieces and a hallmark of an emerging conceptual art movement.

My project tweets vintage urinals as R. Mutt at #fountain and #arthistory. The images are randomly selected from DuckDuckGo’s image search results. At first I found some success using Selenium and the method I used for Top Rep$ (finding and moving through elements using XPath), but this returned only the first 50 results--mostly likely a scrolling issue. Indeed, from scrolling down the page and inspecting the last image, I knew that there were ~330 possibilities. Sam reminded me to check the Ajax calls through the browser’s developer console (Network > XHR), and from there I pulled a link within which I found another link that gave me the image sources in JSON-formatted data. I quickly discovered how I could iterate through all of the results by manipulating a value in this URL. 

From there I wrote two scripts: one to search, scan, and download a picture of a vintage urinal, and another to authenticate into and post the photo to @iamrmutt’s Twitter account. The first script stores all image URLs into an array from which a random one is selected, and the associated file is subsequently downloaded to disk. (Update! In retrospect, after running this for a week, I should also store which links are randomly chosen into a text file and check against that to prevent repeat posts.) From the first script, the second script imports the variable containing the filename of the saved photo and then uses the Tweepy library to post it via Twitter’s API. So though I have two scripts, I only have to call one to complete the entire process. (Of note, since I download the photo with the same filename every time, and because my first script will not download an image if the same name already exists, my update status script deletes the file on my local disk after sending it to Twitter to prevent the same image from posting each time.) 

Troubleshooting update! Since I drafted this post, two issues arose. The first was that my requests to the original DuckDuckGo URL stopped working with the keywords, vintage urinal. Plopping it into my browser returning a blank page except for, "If this error persists, please let us know: ops@duckduckgo.com." However, after making it plural, I was back in business...for a while, until that broke, and I changed it to urinals vintage... I also received this Tweepy error twice in a row: "tweepy.error.TweepError: [{u'message': u'Error creating status.', u'code': 189}]." Though I was  able post text-only updates at the time, the issue eventually resolved itself after a couple of hours. (Update on the update: Adding in the vqd number to the request URL allows me search with the original keywords, but I have yet to uncover what this is exactly. Also, I noticed that the Tweepy error occurs when the downloaded image is zero bytes. Could this be because the image no longer exists online?) All good to know for future projects. 

@iamrmutt
Code on GitHub

Week 7: Top Rep$

 (March design iteration)

(March design iteration)

Introducing Top Rep$, 115th Congress Edition, an educational card game inspired by Top Trumps to promote awareness of members of the United States Congress and to prompt dialogue about the current state of campaign finance and federal lobbying, with a specific focus on funds donated in support or spent in opposition by the National Rifle Association (NRA). With the continued congressional gridlock on gun control, Top Rep$ is dedicated to the surviving families of our country's most recent mass shooting in Parkland, Florida.

For the assignment to automatically manipulate scraped images or video, I built a tool to help me learn more about Congress and what monies flow into members' campaign offices. I was initially motivated to understand who favors gun rights in a country with more mass shootings than anywhere else. I know that the United States' relationship with guns is complicated, and this data is simply a slice of a much larger story. Nevertheless I'm curious, and it's a place to start. Though I began at the Federal Election Commission, it wasn't long before I discovered the Center for Responsive Politics, an organization that not only analyzes and creates contribution profiles from the FEC's campaign finance data but also publishes reports on a variety of issues, including gun rights, on their website OpenSecrets.org

Similar to the original game, each card in Top Rep$ depicts one person, five categories of numerical values, and some brief additional information. Each card, 531 in all at this current moment, contains thirteen pieces of data. Some scraped and some copied and pasted, I organized all of the data into one Google spreadsheet to populate an InDesign template and generate individual cards for each representative. While the total deck includes both U.S. Senators and House Representatives, it does not include non-voting House members nor does it account for current House vacancies. I collected all information from March 11-13, 2018.

 My representative in the House. (July design iteration)

My representative in the House. (July design iteration)

Here's a breakdown of each data point and the process by which I retrieved it. All code is posted on GitHub, along with some of the raw data, my final spreadsheet, and the digital card files.

  1. Portraits - Thumbnail images for members of the 115th Congress were scraped from the Congress.gov and converted to grayscale using a subprocess call to an ImageMagick grayscale conversion. Quick to receive a 429 error with a Beautiful Soup start, I switched to Selenium, located the image URL by XPath syntax, and due to inconsistent file naming, retitled each image with each representative's name. A number of members (~40 and mostly recently elected officials) lacked images, so I used this blank person clip art, adding my own neutral gray background. 
  2. Party Affiliation - Republican, Democratic, and Independent converted from the abbreviations, R, D and I, on the OpenSecrets' Gun Rights vs. Gun Control spreadsheet.
  3. Office - Senator and Representative converted from the abbreviations, S and H, on the OpenSecrets' Gun Rights vs. Gun Control spreadsheet.
  4. State - State abbreviations extracted from the Distid field on the OpenSecrets' Gun Rights vs. Gun Control spreadsheet. I also learned how to add a custom function to Google Sheets to convert those State abbreviations into their full names using code from Dave Gaeddert.
  5. Names - Collected and compared against several sources: the congress.gov portrait scrape, the OpenSecrets.org directory and their Gun Rights vs. Gun Control data tabulation, and committee assignment information for both chambers (see below).
  6. Committees - Already in table format, I copied and pasted this data from the House of Representatives Committee Information and Senate Committee Assignments. Of note, the Office of the Clerk for the House of Representatives does not list joint committee assignments, but the Senate does. This information was added to House representatives where appropriate using committee websites as source material. Since historical data is provided on the playing cards, it worth mentioning that brand new committees as of 2018 include the Joint Select Committee on Solvency of Multiemployer Pension Plans and the Joint Select Committee on Budget and Appropriations Process Reform.
  7. Years in this Chamber. Again using XPath syntax I scraped "Years in Service" from the directory at Congress.gov. For some reason about which I'm still unclear, I was unable to pull data from both chambers at once, so instead I wrote two scripts, filtering the chamber in the initial url request. To calculate the total number of years in the spreadsheet, I extracted the base year from the string and subtracted it from 2018. Manually calculating it for those officials serving nonconsecutive terms in the House. It was not until I started working with the data that I realized that again the most recently elected officials (within the last year) were missing, and so their service years were computed by hand. Of note, Top Rep$ lists number of years served in the current chamber, however some senators sat in the House before the Senate. Career totals for gun rights support and campaign fundraising, however, account for all of their years as an elected federal representative. 
  8. NRA $ Ranking* - Based on NRA Grand Total, see #9 below.
  9. NRA Grand Total - From the OpenSecrets.org Gun Rights vs. Gun Control document, specifically the "NRA spending (115th Congress)" sheet, I pulled data from the field, NRA Grand Total. The NRA is the top gun rights contributor. Notes from the Center for Responsive Politics regarding this data: "These are career totals, and so therefore can go as far back as 1989. NRA direct support includes contributions from the NRA PAC and employees to candidates. Indirect support includes independent expenditures (and electioneering communications) supporting the candidate; opposition is IEs and ECs opposing the candidate. 'Independent expenditures for opponent' is spending by the NRA supporting a candidate OTHER than the member listed (note that could be someone of the same party, if they supported someone else in the primary), and "Indep Expend against opponent" is spending by the NRA opposing a candidate OTHER than the member listed. For the grand total, we summed the direct support + indirect support + indep expend against opponent, and then subtracted indirect opposition and indirect expenditures for the opponent. This produces a grand total, which can be, and often is, negative. A negative value indicates that the NRA tends to oppose this member."
  10. Campaign Committee Fundraising Top Industry - After spending some time on the OpenSecrets congressional directory, I decided to maintain consistency with gun rights data and pull career totals for each member of the 115th Congress, in this case the leading industry and total amount contributed. I was curious to see this landscape over time (how do totals compare for varying amounts of years served?) and in comparison to members' committee work (do the top industries align with their assignments?). I tried and failed to scrape this information successfully: except for the representative's name, the top industry name and total amounts always returned empty strings (even though at one point I figured out how to extract the elements' attribute data). While I located a number of related issues on Stack Overflow, nothing panned out. Fully committed to the project at this point and working to meet a printing deadline, I visited each page individually and copied the information I needed. (My thoughts on this non-automated process below.)
  11. Campaign Committee Fundraising Top Industry Amount - See above.
  12. Campaign Contributor Top Contributor - Same as above but for top contributors.
  13. Campaign Contributor Top Contributor Amount -  Same as above but for top contribution amounts.

    *July 2018 Update: For the March design iteration this category was called Gun Rights and from the OpenSecrets.org Gun Rights vs. Gun Control document, specifically the "Career Gun $ to 115th Congress" sheet, totaled amounts from three columns: Total from Gun Rights (Pink), Gun Control Opposed (Blue), and Gun Rights Support (Blue). After reviewing the sheet in July, I mistakenly interpreted Gun Control Opposed as groups opposed to gun control instead of representing groups advocating for gun control that have spent money to oppose the individual. In fact, further investigation reveals that I need to confirm with the Center for Responsive Politics how the Career Gun amounts differs from the NRA spending totals. Until then, the updated card set will only pull amounts from the "NRA Grand Total" and corresponding "Rank" from the dataset.

Additional credits: It's one thing to look at data across a spreadsheet and another to see it take form into individual playing cards in InDesign. To spiffify them I added the Great Seal of the United States 1904 Coat of Arms to the backs, and used the fonts, Gregory Packaging and Proxima Nova Light.

Considerations
So yeah: I visited each representative's profile at OpenSecrets.org to copy and paste campaign fundraising data. I checked the API when I stumbled onto the site, but I should revisit it now that I have a sense of the data I want. Next time I'll also investigate requesting a custom data report or troubleshoot my initial scraping hurdle in this area. That being said, retrieving the data became a meditation, and I started to visualize the geography of the country through the lens of industry that might have eluded me had I not been so thoroughly steeped in the data (or until I actually played the game a million times). Receive large contributions from Microsoft? You most likely represent the state of Washington. Walmart? Arkansas. Funded by oil and gas companies? You probably hail from large states west of the Mississippi. And so on. In addition, although not surprising, committee assignments often matched with top industries and contributors. Serve on the Committee of Financial Services? There's likely a large bank (or several) on your list. The Senate Committee on Health, Education, Labor, and Pensions? Look for the health professionals or pharmaceuticals/health products industries. As for gun rights support recipients, aside from the expected partisan divide, I have more studying to do. But it does make me wonder about other tangible ways of working with this and similar datasets to see relationships I hadn't otherwise considered. (And for future Ellen: I'm quite intrigued by some of the anomalies tracked on OpenSecrets.org, such as the percentage of contributions received by politicians from out of their home states.)