ECS637U/ECS757P - Digital Media and Social Networks Group Project This coursework should be done in groups of 3-4 people and is assessed in two deliverables (deadlines below). The deadline for forming groups is 10pm 9 February: After this you will be allocated to a group. If you do not have a “full” group, you will be automatically assigned new group members after this time. The total grade weight for this coursework is 30% (15% for deliverable 1 and 15% for deliverable 2). NB: You may only form groups with students at the same level (Groups may not contain a mix of undergraduate and postgraduate students). Amendment History Date Description Person Feb 20th Clarifications to description Mathieu Barthet Feb 24th Further breakdown of marking guidelines for Lightning Talks and minor clarifications Laurissa Tokarchuk Mar 11th Amended final date to match QMPlus hand in on main page Laurissa Tokarchuk Mar 24th Further extension to the final hand in Laurissa Tokarchuk Preliminary work As a group ensure that you have each read three or four of the essential reading papers listed at the end of each week’s lectures. Note we don’t always talk about all of the results of these papers during lectures. [Optional for UG, Required for PG: Read beyond the essential papers. Perhaps look at more current papers from a paper discussed in class (perhaps papers which cite the paper)]. Meet together as a group to discuss these. Questions you may want to consider are: ● What is the technical content of the paper? ● What are the strengths and weaknesses? ● How have they done their analysis? ● What other kinds of datasets could be analysed using these steps? Choose either one of these papers to follow as an analysis guideline OR choose one of the ideas posted in the coursework section on QMPLUS. For the first deliverable (described below) you should think about how this applies to your dataset (see below) and what you expect to find. You are not expected to have completed the analysis for this deliverable. Your group should select an existing public dataset or collect a new network dataset. The dataset chosen or collected must be >200 and can either be collected by your group or chosen from a public dataset such as those provided from http://snap.stanford.edu/data/, Index of Complex Networks https://icon.colorado.edu/#!/networks?platform=hootsuite, Network Repository http://networkrepository.com/ and others. In choosing your dataset bear in mind the papers you have read, which dataset (not used in the original analysis) would be suitable for a similar kind of analysis as that which was presented. So for example if you chose the Onnela paper discussed in week 2 you would need to pick something that had a concept of tie strength. Discuss your ideas, datasets etc with the lecturers/teaching assistants for the module. From the 10th of February, all demonstrators will have bookable time to have short chats (see QMPlus forum). Deliverable 1: Lightning talks hand-in (Feb 28th), and Lightning Talk Day (March 2nd and March 6th) - Grade weight: 15% A lightning talk is a short pitch to articulate a topic in a quick, insightful, and clear manner. For this coursework, the lightning talk should include the following content (2 to 3 slides): ● title of your study, your names, coursework information and year ● a brief introduction to your dataset ● a very brief presentation of the study in the original dataset paper and/or paper(s) that have inspired your proposed analysis ● what you plan to do in deliverable #2 in terms of network analysis for your dataset: briefly describe the proposed analysis, how it compares to analyses from reference paper(s), what you expect the results will look like The talk should last no more than 4 minutes so “PRACTICE PRACTICE PRACTICE” is the key. All members of the group should speak. What to hand in: Absolute maximum of 6 slides: ● 1 Slide: Title slide (introducing your group members/Topic). ● 2 to 3 slides that your group will use to talk to (see content above) ● 2 supporting slides containing the basic network statistics. ● Presentations should be either PDF slides or PPT. These slides should all be submitted as one submission. Lightning Talk Session information: Times: Session V1 4pm - 6pm Skeel LT 2nd March. Session V2 4pm - 6pm Eng 3.25 2nd March. Session V3 2pm - 3pm Skeel LT 6th March. Part of the goal of the lightning talk is to give you a chance to explore what some of the other groups have come up. NOTE: Presentations will be marked your allocated session. Any groups with no members present on presentation day will not be graded. Talks will be graded as follows (based both on the slides and presentation): [Presentation] Clarity of description of previous work 10% [Presentation] Suitability of the dataset for the analysis presented. 15% [Presentation] Evaluation methodology proposed 15% [Presentation] Ability to keep to time 5% [Slides] Style, Writing and clarity of layout 1. Style (/7) 2. Writing (/7) 3. Clarity of Layout (/6) 20% [Slides] Dataset statistics: 1. The dataset, description, source, visualisation, nodes/edges is presented.(/10) 2. Appropriate range of statistics presented. (/20) (Suggested ones are degree distribution, clustering coefficient, modularity, centrality, but others acceptable) 3. Evidence of these being calculated. (/5) (Gephi graphs, code snips or screenshot, etc) 35% All attending members of the project team will receive the same grade, members not present without a valid EC will receive 0. Estimated number of study hours for this coursework deliverable including lectures and seminars: 22.5 hours Deliverable 2: April 20th - Grade Weight: 15% Your group should write a short paper (length MAX: 6 pages excluding references) using IEEE conference format https://www.ieee.org/conferences/publishing/templates.html. All reports should contain (but are not limited to) sections such as introduction, related work, dataset, approach, results, conclusions, references. The author listing at the top should detail the contributions of each author to the project as a whole and suggested percentage division based on that. Your individual grades will be adjusted according to contribution: Jane Doe
[email protected] Problem formulation, algorithm implementation, report writing. 35% Bill Smith
[email protected] Data acquisition and pre-proprossing, graph drawing, report writing. 25% Leanne Kains
[email protected] Running test, comparing algorithms, tabulating results, report writing. 40% The results should be commented and justified. It is not sufficient to simply list results; you should derive conclusions. What to hand in: ● Your 6 page paper. ● Appendix IF: ○ You implemented your own algorithm (include the code). ○ You collected your data set (include code and details about collection). Please hand the Appendix in as part of your paper (if you have an appendix you can go beyond 6 pages). QMPlus will only accept one document as your hand in and not a zip. Reports will be graded as: Introduction and problem definition 15% Related work 10% Dataset and algorithm/model description 30% Results and findings 30% Style and Writing 15% Groups should submit to QMPlus the PDF paper as detailed above. Other information You are free to use any data analysis tool, e.g. R, Matlab, Gephi, and can even calculate measurements by yourselves. Supporting materials: 1. R and Data mining : Yanchang Zhao, Chap 10&11, http://www.rdatamining.com/home 2. R language: Computing for Data Analysis https://www.coursera.org/course/compdata 3. iGraph with R: http://www.r-bloggers.com/network-visualization-in-r-with-the-igraph-package/ 4. Gephi : Gephi - The Open Graph Viz Platformhttps://gephi.org/ Network analysis with Gephi: 10:55Gephi Tutorial - How to use Gephi for Network Analysis Estimated number of study hours for this coursework deliverable including lectures and seminars: 22.5 hours Some Papers to start (more in notes): Structure and tie strengths in mobile communication networks. J. P. Onnela, J. Saramaki, J. Hyvonen, G. Szabo, D. Lazer, K. Kaski, J. Kertesz, A. L. Barabasi. Proceedings of the National Academy of Sciences, Vol. 104, No. 18. (13 Oct 2006), pp. 7332-7336. Maintained relationships on facebook. Cameron Marlow, Lee Byron, Tom Lento, and Itamar Rosenn. 2009. On-line at https://www.facebook.com/notes/facebook-data-science/maintained-relationships-on-facebook/5525722885 8/ Social networks that matter: Twitter under the microscope.Bernardo A. Huberman, Daniel M. Romero, and Fang Wu. First Monday, 14(1), January 2009. David A. Shamma, Lyndon Kennedy, and Elizabeth F. Churchill. 2009. Tweet the debates: understanding community annotation of uncollected sources. In Proceedings of the first SIGMM workshop on Social media (WSM '09). ACM, New York, NY, USA P. Grabowicz, J. Ramasco, E. Moro, J. Pujol, V.. Eguiluz. Social features of online networks: the strength of weak ties in online social media. arXiv:1107.4009. July 2011.