[an error occurred while processing this directive] [an error occurred while processing this directive]
[an error occurred while processing this directive]
[an error occurred while processing this directive]
Monash University

FIT3152 Data science - Semester 2, 2013

In recent years the world has seen an explosion in the quantity and variety of data routinely recorded and analysed by research and industry, prompting some social commentators to refer to this phenomenon as the rise of "big data," and the analysts and practitioners who investigate the data as "data scientists."

The data may come from a variety of sources, including scientific experiments and measurements, or may be recorded from human interactions such as browsing data or social networks on the Internet, mobile phone usage or financial transactions. Many companies too, are realising the value of their data for analysing customer behaviour and preferences, recognising patterns of behaviour such as credit card usage or insurance claims to detect fraud, as well as more accurately evaluating risk and increasing profit.

In order to obtain insights from big data new analytical techniques are required by practitioners. These include computationally intensive and interactive approaches such as visualisation, clustering and data mining. The management and processing of large data sets requires the development of enhanced computational resources and new algorithms to work across distributed computers.

This unit will introduce students to the analysis and management of big data using current techniques and open source and proprietary software tools. Data and case studies will be drawn from diverse sources including health and informatics, life sciences, web traffic and social networking, business data including transactions, customer traffic, scientific research and experimental data. The general principles of analysis, investigation and reporting will be covered. Students will be encouraged to critically reflect on the data analysis process within their own domain of interest.

Mode of Delivery

Clayton (Day)

Contact Hours

2 hrs lectures/wk, 2 hrs laboratories/wk

Workload requirements

Lectures: 2 hours per week
Tutorials/Lab Sessions: 2 hours per week per tutorial
and up to an additional 8 hours in some weeks for completing lab and project work, private study and revision.

Unit Relationships

Prerequisites

FIT1006, ETC1000 or equivalent. (For example BUS1100, ETC1010, ETC2010, ETF2211, ETW1000, ETW1010, ETW1102, ETW2111, ETX1100, ETX2111, ETX2121, MAT1097, STA1010)

Chief Examiner

Campus Lecturer

Clayton

Dr John Betts

Dr Sue Bedingfield

Tutors

Clayton

Mr Rj Chow

Dr Kefeng (Jason) Xuan

Academic Overview

Learning Outcomes

At the completion of this unit students will have -A knowledge and understanding of:
  • analysing large data sets;
  • data cleansing and preparation;
  • open source and proprietary software for data analytics;
  • techniques and tools for data analytics;
  • validation of results.
Developed attitudes that enable them to:
  • model business problems by transforming the problem into an analytics problem that can then be solved using data analytics techniques. The insights from the analysis are then related back to the original business problem;
  • interpret data within a domain-specific context;
  • understand how data analytics may be used within organisations to understand current practice and identify potential opportunities;
  • appreciate the value of data analytics over traditional statistical analysis and modelling;
  • critically evaluate the limitations and benefits of data analytics.
Gained practical skills to:
  • manage large data;
  • prepare data for analysis;
  • analyse large data sets; in particular textual data sets;
  • construct and test the reliability of predictive models;
  • techniques and tools for data analytics.
Demonstrated the communication skills necessary to:
  • frame a business problem in terms of a formulation suitable for the application of data analytics tools;
  • communicate and report analysis and findings.

Unit Schedule

Week Activities Assessment
0   No formal assessment or activities are undertaken in week 0
1 Introduction to Data Science. Introduction to R. Review of basic statistics using R Tutorial Participation assessed Weekly
2 Exploring data using graphics in R  
3 Analytics and modelling in R  
4 Data cleansing, consulting, case studies. (Guest Lecture)  
5 Programming in R Group Assignment (Initial report) due 30 August 2013
6 Classification using decision trees  
7 Comparing classification models, ensemble techniques  
8 K-Means and hierarchical clustering  
9 Text analysis  
10 Scalable algorithms. Map Reduce Individual Assignment due 11 October 2013
11 Student Presentations Students will give a brief presentation of their group project results. Group Assignment (Final report) due 18 October 2013
12 Review of the course and exam preparation  
  SWOT VAC No formal assessment is undertaken in SWOT VAC
  Examination period LINK to Assessment Policy: http://policy.monash.edu.au/policy-bank/
academic/education/assessment/
assessment-in-coursework-policy.html

*Unit Schedule details will be maintained and communicated to you via your learning system.

Assessment Summary

Examination (2 hours): 60%; In-semester assessment: 40%

Assessment Task Value Due Date
Group Assignment 20% Initial report due 30 August 2013, Final report due 18 October 2013
Individual Assignment 10% 11 October 2013
Tutorial Participation 10% Weekly
Examination 1 60% To be advised

Teaching Approach

Lecture and tutorials or problem classes
This teaching and learning approach helps students to initially encounter information at lectures, discuss and explore the information during tutorials, and practice in a hands-on lab environment.

Assessment Requirements

Assessment Policy

Faculty Policy - Unit Assessment Hurdles (http://www.infotech.monash.edu.au/resources/staff/edgov/policies/assessment-examinations/unit-assessment-hurdles.html)

Academic Integrity - Please see the Demystifying Citing and Referencing tutorial at http://lib.monash.edu/tutorials/citing/

Assessment Tasks

Participation

  • Assessment task 1
    Title:
    Group Assignment
    Description:
    Students will work in groups to analyse a large data set and report their findings.
    Weighting:
    20%
    Criteria for assessment:
    • Understanding of the real-world problem, and how the data might be used to solve the problem.
    • Cleansing and pre-processing the data.
    • Visual representation of the data, and initial insights into the data. (Initial report at this milestone)
    • Accuracy and reliability of the model.
    • Reporting and communication of results.

    As this is a group project, students in each group will allocate a weighting of the final results to each member of the group based on a consensus estimate of each member's contribution. 

    Due date:
    Initial report due 30 August 2013, Final report due 18 October 2013
  • Assessment task 2
    Title:
    Individual Assignment
    Description:
    Students will individually analyse a data set and report their findings.
    Weighting:
    10%
    Criteria for assessment:
    • Understanding of the problem, and how the data might be used to solve the problem.
    • Cleansing and pre-processing the data.
    • Visual representation of the data, and initial insights into the data.
    • Accuracy and reliability of the model.
    • Reporting and communication of results. 
    Due date:
    11 October 2013
  • Assessment task 3
    Title:
    Tutorial Participation
    Description:
    Students will be assessed on their participation during tutorials.
    Weighting:
    10%
    Criteria for assessment:
    • Participation in tutorials
    • Completion of class exercises
    • Contribution to class discussions
    Due date:
    Weekly

Examinations

  • Examination 1
    Weighting:
    60%
    Length:
    2 hours
    Type (open/closed book):
    Closed book
    Electronic devices allowed in the exam:
    Electronic calculators permitted in the exam.

Learning resources

Monash Library Unit Reading List
http://readinglists.lib.monash.edu/index.html

Feedback to you

Types of feedback you can expect to receive in this unit are:
  • Informal feedback on progress in labs/tutes
  • Graded assignments with comments
  • Solutions to tutes, labs and assignments

Extensions and penalties

Returning assignments

Referencing requirements

As per Faculty policy (referencing for Master coursework and undergraduate - see http://intranet.monash.edu.au/infotech/resources/staff/edgov/policies/units/style-masters-ug-degrees.html), the Unit Guide will include links to the relevant referencing requirements for the unit.

Assignment submission

It is a University requirement (http://www.policy.monash.edu/policy-bank/academic/education/conduct/plagiarism-procedures.html) for students to submit an assignment coversheet for each assessment item. Faculty Assignment coversheets can be found at http://www.infotech.monash.edu.au/resources/student/forms/. Please check with your Lecturer on the submission method for your assignment coversheet (e.g. attach a file to the online assignment submission, hand-in a hard copy, or use an online quiz). Please note that it is your responsibility to retain copies of your assessments.

Online submission

If Electronic Submission has been approved for your unit, please submit your work via the learning system for this unit, which you can access via links in the my.monash portal.

Recommended text(s)

W. N. Venables, D. M. Smith. (2013). An Introduction to R. () Available from: http://www.cran.r-project.org/doc/manuals/R-intro.pdf.

M. Allerhand. (2011). A tiny handbook of R. () SpringerLink (Online service), Online access via Library.

Pang-Ning Tan, Michael Steinbach, Vipin Kumar. (2006). Introduction to data mining. () Addison-Wesley.

Luis Torgo. (2011). Data mining with R: learning with case studies. () Chapman & Hall CRC.

Other Information

Policies

Graduate Attributes Policy

Student services

Monash University Library

Disability Liaison Unit

Students who have a disability or medical condition are welcome to contact the Disability Liaison Unit to discuss academic support services. Disability Liaison Officers (DLOs) visit all Victorian campuses on a regular basis.

Your feedback to Us

Previous Student Evaluations of this Unit

This is a new unit.

If you wish to view how previous students rated this unit, please go to
https://emuapps.monash.edu.au/unitevaluations/index.jsp

[an error occurred while processing this directive]