Introduction
This project delves into social network analysis, focusing on Twitch, a popular streaming platform. Using a dataset from the Stanford Network Analysis Project (SNAP), the research explores the intricate dynamics of Twitch's user base, leveraging a comprehensive and credible dataset.
The dataset comprises 168,114 nodes and 6,797,557 edges, encapsulating the extensive social fabric of Twitch. Curated in Spring 2018 from Twitch's public API, the dataset is specifically tailored for machine learning applications, such as node classification, user behavior prediction, and community detection.
Dataset
Nodes: 168,114
Edges: 6,797,557
File Format:
.zipPurpose: To catalyze research in social network analysis, focusing on Twitch's user dynamics.
Applications of the dataset include:
Node classification
Count data regression
Content streamer identification
Broadcaster language prediction
User lifetime estimation
Churn prediction
Affiliate status identification
View count estimation
Project Process
Graph Construction:
Utilized Python's
NetworkXlibrary to build and analyze the network graph.Parsed the dataset to extract node and edge information, creating a structured representation.
Visualization:
Employed
MatplotlibandNetworkXto visualize the graph.Adjusted parameters (e.g., node size, color, edge width) for clarity.
Used layout algorithms like spring and circular layouts for enhanced visual appeal.
Analysis:
Explored key metrics: Degree centrality, closeness centrality, betweenness centrality, network diameter, and edge connectivity.
Identified the ultra-small world phenomenon and a highly dense single community structure.
Results
Network Metrics:
Nodes: 49 (subset)
Edges: 1,176
Average Node Degree: 48
Network Diameter: 1
Key Findings:
Highly interconnected network with rapid information propagation capabilities.
Strong resilience against disruptions due to high edge connectivity.
Uniform centrality values indicating potential anomalies or a curated dataset subset.
Reflections and Learnings
What Went Well
Successfully implemented graph construction and analysis techniques.
Gained deep insights into the dynamics of high-density networks.
Areas for Improvement
Improve data representation and ensure representative samples.
Incorporate dynamic network analysis earlier in the process.
Broaden the dataset for more comprehensive insights.
Learnings
Enhanced understanding of network density implications, centrality measures, and resilience.
Future Work
Network Analysis: Explore interaction patterns, identify influencers, and uncover clusters.
Content Preference Analysis: Analyze user-content relationships for content strategy insights.
Community Detection: Use algorithms to find and understand subgroups within Twitch.
User Behavior Prediction: Build models for predicting user engagement and content popularity.
Anomaly Detection: Identify unusual patterns to maintain platform integrity.
Content Recommendation Systems: Develop graph-based personalized recommendations.
Influence Analysis: Quantify user influence using engagement metrics and centrality measures.
Dynamic Network Analysis: Study the evolution of the Twitch network over time.
Dataset
The dataset can be accessed at the SNAP Project Website.
Authors: Dhruv Singh, Kunal Samant
License: MIT
Contact: dsingh28@hawk.iit.edu, ksamant@hawk.iit.edu




