Datasets

An overview of the data we currently collect.

Currently we have many different scripts which allow us to track the political debate in social and traditional media. The goal is to scale those data collections up over time to collect more and more parts of the swiss political discourse in as many different channels as possible.

Twitter Data

  • The Twitter Data consists of over 1'700 accounts from multiple parties, organizations, politicians as well as tweets from any account containing specific hashtags through Twitters officialAPI.
  • Data is collected continuously and currently includes 3.2 million tweets since the beginning of 2019

Facebook Data

  • The data from fcebook only contains openly accessible/public Facebook pages of national parties, politicians , newspapers and other pages and groups collected through an API made available by CrowdTangle.
  • Data is collected continuously since the beginning of 2020 and includes over 680'000 posts starting form 2018 on onwards from over 300 different pages.

SRF Arena Subtitles

  • For the SRF Arena debates on TV, we have a collection of subtitles (2010-2019), allowing to analyse what political actors say.
  • This corpus consists of around 45 shows per year.

Newspaper Articles

  • We collect all newspaper articles from over 85 different newspapers in Switzerland through an API made available by the SMD (Schweizerische Mediendatenbank).
  • Collection began in mid 2019 and includes all articles from every newspaper since the beginning of 2012 which totals to over 11 million articles.

Parliamentary Debates and Affairs

  • Since mid 2020 we collect all parliamentary debates and affairs from the swiss national parliament through two web scrapers which collect new debates and affairs.
  • The parliamentary speeches cover all debates of the Council of States and National Council since beginning of 2000 until today which includes over 48'000 debates.
  • The parliamentary affairs cover all affairs filed to the Swiss national parliament since the beginning of 2000 until today which includes over 39'000 affairs.

Press Releases and RSS-Feeds

  • This data includes available RSS feeds or data scraped by web scrapers from websites of parties organizations and the government to track their press releases.
  • Data is continuously collected for all major parties and organizations since the beginning of 2018 and includes over 50'000 press releases and over 43'000 rss messages.