144x Filetype PDF File size 0.73 MB Source: www.nagarjunauniversity.ac.in
B.Sc DATA SCIENCE SUBJECTS MEMBERS OF THE BOARD SIGNATURES External Member: Prof Ch. Haritha HOD, Dept of CSE JNTUK Kakinada 1. Dr.M.KamalaKumari - Chairman Dept of CSE, AKNU, RJY 2. Dr.P.Venkateswara Rao – Member Dept of CSE, AKNU, RJY 3. Mr.M. Simhadri – Member Lecturer, Aditya Degree College, Kakinada 4. Mr.B N S Gupta – Member Lecturer, SVKP & Dr. K.S Raju Arts & Science College Penugonda PAPER 1: INTRODUCTION TO DATA SCIENCE AND R PROGRAMMING Objective Data Science is a fast-growing interdisciplinary field, focusing on the analysis of data to extract knowledge and insight. This course will introduce students to the collection. Preparation, analysis, modelling and visualization of data, covering both conceptual and practical issues. Examples and case studies from diverse fields will be presented, and hands-on use of statistical and data manipulation software will be included. Outcomes i. Recognize the various discipline that contribute to a successful data science effort. ii. Understand the processes of data science identifying the problem to be solved, data collection, preparation, modelling, evaluation and visualization. iii. Be aware of the challenges that arise in data sciences. iv. Be able to identify the application of the type of algorithm based on the type of the problem. v. Be comfortable using commercial and open source tools such as the R/python language and its associated libraries for data analytics and visualization. Unit-I Defining Data Science and Big data, Benefits and Uses, facets of Data, Data Science Process. History and Overview of R, Getting Started with R, R Nuts and Bolts Unit-II The Data Science Process: Overview of the Data Science Process-Setting the research goal, Retrieving Data, Data Preparation, Exploration, Modeling, data Presentation and Automation. Getting Data in and out of R, Using readr package, Interfaces to the outside world. Unit-III Machine Learning: Understanding why data scientists use machine learning-What is machine learning and why we should care about, Applications of machine learning in data science, Where it is used in data science, The modeling process, Types of Machine Learning-Supervised and Unsupervised. Unit-IV Handling large Data on a Single Computer: The problems we face when handling large data, General Techniques for handling large volumes of data, Generating programming tips for dealing with large datasets. Case study- Predicting malicious URLs(This can be implemented in R) Unit-V Subsetting R objects, Vectorised Operations, Managing Data Frames with the dplyr, Control structures, functions, Scoping rules of R, Coding Standards in R, Loop Functions, Debugging, Simulation References 1. DavyCielen, Arno.D.B.Maysman, Mohamed Ali, “Introducing Data Science” Manning Publications, 2016. 2. Roger D. Peng, “R Programming for DataScience” Lean Publishing, 2015. 3. Nina Zumel, John Mount, “Practical Data Science with R”, Manning Publications, 2014. 4. Mark Gardener, “Beginning R - The Statistical Programming Language”, John Wiley & Sons, Inc., 2012. 5. W. N. Venables, D. M. Smith and the R Core Team, “An Introduction to R”, 2013. 6.Tony Ojeda, Sean Patrick Murphy, Benjamin Bengfort, AbhijitDasgupta, “Practical Data Science Cookbook”, Packt Publishing Ltd., 2014. Student Activity Students should be able to create a database and read and write from it. Transfer data to and from csv and different types of files. Should clean data and make it consistent for any sort of analysis in R Perform statistical analysis on variety of data Perform appropriate statistical tests using R and visualize the outcome Continuous assessment: Let the students be tested in the following questions from each unit 1. Define Data Science. Discuss any application as an example 2. What are the main components of R and explain basic R commands 3. Explain the phases in Data Science Process 4. What is machine learning. What are the differences between machine learning, artificial intelligence and data science 5. What are the general techniques to handle large volumes of data 6. Develop any data visualisation ion application by creating data frames and applying operations on it and using relevant packages BASICS OF R LAB 1) Installing R and R studio 2) Basic operations in r 3) Getting data into R, Basic data manipulation, Loading Data into R 4) Basic plotting 5) Loops and functions 6) Create Vectors, Lists, Arrays, Matrices, Data frames and operations on them. 7) Demonstrate the visualization and graphics using visualization packages. 8) Implement Loop functions with lappy(), sapply(), tapply(), apply(), mapply(). 9) Explore data using Single Variables: Unimodal, Bimodal, Histograms, Density Plots, Bar charts 10) Explore data using two Variables: Line plots, Scatter Plots, smoothing cures, Bar charts 11) Explore and implement commands usinfdplyr package 12) Generate random numbers and set seed PAPER 1: INTRODUCTION TO DATA SCIENCE AND R PROGRAMMING MODEL QUESTION PAPER Part - A Answer Any FIVE Questions 5*5=25M 1. What is data science and its benefits? 2. Explain role and stages in data science? 3. What are the goals of data science? 4. How to retering the data in data science? 5. Explain supervised and unsupervised machine Learning? 6. Why we need the machine Learning in data science? 7. What is cluster Analysis? 8. Explain case studies in R Language? 9. How to declare functions in R Language? 10. Explain vectorized operations in R Language? Part - B Answer Any FIVE Questions 5*10=50M 11. How to Install the R-studio? 12. What are input and output in R-Language? 13. Explain different stages of data Science? 14. How to getting the data in and out of R-Language? 15. What is machine learning? What is its role in data Science? 16. What are the applications of machine Learning in data science? 17. Explain general techniques for handling volumes of data? 18. What are the problems face when handling large data? 19. What are the data frames? Write its significance in R-Language? 20. Explain R Objects?
no reviews yet
Please Login to review.