Right Relevance (RR) provides curated information and intelligence on ~50 thousand topics. This includes:
- Topic relationships including related topics & semantic information like synonyms.
- Topical influencers (~2.5M) with score and rank.
- Topical content and information in the form of articles, videos and conversations.
Additionally, Right Relevance provides an Insights offering that combines the above Topics and Influencers information with real time conversations to provide actionable intelligence with visualizations to enable decision making. The Insights service is applicable to emerging events like elections, conferences, emerging technologies, product launches etc.
This report is a summary of graph analysis of engagements and conversations including retweets, mentions and replies of tweets related to the subject of ‘drug pricing’.
The report leverages public data including tweets sampled from March 7th to March 27th 2017 and Right Relevance topics, topical influencer communities and articles information from the web (over 2M sites) esp. social web and wikipedia.
The phrases used for gathering tweets are: “drugpricing”, “drug pricing”, “drugprice”, “drugprices”, “drug price”, “drug prices”, “340B”
Most of the summary report is extracted from the analysis collateral in the form of:
- Tableau Online Dashboard
- Gephi Communities Graph Visual: Extracts are shown below.
For access to Tableau and the complete graphs please send email to email@example.com.
The analysis methodology is outlined at http://188.8.131.52/insights
Community detection graph algorithms like Walktrap and InfoMap are used to identify communities (as sub-graphs) in our engagements graph built using Neo4j & R. Graph visualizations are done via Gephi.
The all engagements graph (Fig 1), which includes mentions, shows the highly political nature of the conversations with several scattered communities around major political figures like Donald Trump, Bernie Sanders, Elijah Cummings, Chuck Schumer, Mike Pence, Tom Price, and Rand Paul among others. Media is heavily engaged too.
There seems to be just one cohesive subgraph (in GREEN) aka community that seems to be organic and pharma related. We’ll look at it in details below since it forms a clear flock in our analysis.
The RTs-only graph is fairly sparse and shows pretty much the same thing as the all engagements graph, several scattered conversations around political and media figures with just one cohesive community.
Finding real influence in this noise is a challenge. The overwhelming political and media invovlement makes identifying the people/accounts with organic influence hard. This requires application of several graph based techniques that self organize data and isolate valuable information as we will show below.
Latent Dirichlet allocation (LDA) based text analysis of the tweets is used for identifying high value trending terms. These along with hashtags and Right Relevance topics form the basis for identifying top conversation themes during the analysis timeframe.
The meeting between Rep. Elijah Cummings and Donald Trump, Donald Trump’s tweet about bringing down drug prices, Senator Bernie Sanders’s multiple tweets on this issue along with the GoP healthcare bill seem to have been the the top themes driving conversations during the 3 weeks’ duration we monitored.
Fig 2 shows high drug prices (as expected), competition, exploitative corporate, price transparency as the main trending terms along with Trump related terms like impeached, wiretap, leaked etc.
The top RR topics linked with the conversations are pharmaceutical industry, biotechnology, healthcare, public health, life sciences and politics.
The top hashtags also, show a similar healthcare infusion considering the GoP healthcare bill conversation was going on in parallel and being related to the drug prices issue in general without any specific provisions for them.
The top tweets with most engagements were:
Right Relevance ‘engagement influence’ is calculated by measuring the quality and quantity of engagements (RTs, mentions, replies), reach of tweets etc. within the context of a subject (event, trend etc.). The communities formed via this methodology are termed as ‘Flocks’. This “flocking” can lead to building of temporal communities with local influence that can lead to virality not obvious by the standalone influence of the individuals or without the context of the event.
We apply several methods including PageRank and Betweenness centrality to measure Flock influence. The meaning of rankings within this methodology are documented at Twitter Conversation Performance Measures.
The first two lists (Fig 3) are of the top 30 accounts by PageRank & Overall measures. Overall rank is a normalized rank to reduce the skew towards users with large numbers of followers or a single tweet having a large number of engagements/RTs (often referred to as becoming ‘viral’).
PageRank brings up @realDonaldTrump, @POTUS, @SenSanders, @RepCummings, @NPRWeekend, @WhiteHouse, @SecPriceMD, @RandPaul, @VP among others. There isn’t much information in this since the high engagements counts for these overwhelm other potentially far more interesting accounts. This clearly shows the susceptibility of PageRank to high followers count.
In this case, the top overall measure, in spite to the normalized nature, shows the same accounts as PageRank in the top 5 due the high impact. But, right after that we notice several accounts like @megtirrell, @DrugPatentWatch, @heatinformatics, @CRBestBuyDrugs, @P4AD_, @HealthyMaryland, @lydiaramsey125 that seem highly influential and organic in the context of the overall drug pricing issue.
The results above lead to other measures becoming important to measure influence as discussed below.
Connectors list is based on Betweenness centrality, which is a measure of the degree to which a node forms a bridge or critical link between all other users. We use this as a measure of influence wrt value in being information and/or communication hubs.
The first thing of notice is that most of the top accounts, as measured by PageRank, are missing.
@P4AD_, @CRBestBuyDrugs, @megtirrell, @DrugPatentWatch, @heatinformatics, @HealthyMaryland, @lydiaramsey125 were already pushed to the top in the Overall ranking measure due to their high betweenness scores. Several others like @medisien, @MelindaMWedding, @benwakana, @Kathersuch, @ASG_KEI, and @reshmagar are brought to the fore. These accounts seem to have real and organic influence in this domain and betweenness centrality does a great job in bringing them to the top.
The value of this measure lies in that it bubbles up accounts with real influence in terms of news and information dissemination on this subject.
Most major flocks are around political personalities due to the highly politicized nature of the conversations and the engagements.
The only major non-political & non-media cohesive Flock that formed is tagged “statnews”, which is the highest PageRank account in this flock.
Looking at the top RR topics, hashtags and users (Fig 5), like this flock seems focused on focused on pharma, drugs, health, biotech and drug pricing related issues and not impacted by the concurrent political narratives.
The subgraph for the flock is clear in the all engagements graph. More interestingly and confirming the analysis above, when we superimpose the following terms; pharmaceutical industry, healthcare industry, biotechnology and medical technology; on this graph, the accounts that are isolated are overwhelmingly part of the statsnews flock. This provides further proof of the influence of this flock on the subject of ‘drug pricing’.
The top tweets within the statnews flock were:
By each passing analysis, it’s becoming obvious that the scale of and access to data is increasing exponetially but finding relevant and trustworthy information is becoming progessively harder. Partly it’s due to the quantity of data and partly there seems to be a concerted effort to mis/disinform by overloading the pipes by noise. Finding real influence, relevance and actionable intelligence in this noise is a challenge.
The application of several graph based techniques helped self organize noisy data and isolate valuable information as we saw. We are able to cut through the overwhelming political and media chatter to identify the people/accounts with organic influence and inductively information that’s both relevant and trustworthy.
Some high level points:
- Not many cohesive organic communities were formed on Twitter.
- Highly politicized conversations and engagements made more stark due to Trump’s personal involvement along with Senator Sanders tweets.
- One active pharma focused organic community isolated leading to identification of sources of real influence and relevant information.
Please contact firstname.lastname@example.org for more details.