-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
321 lines (262 loc) · 40 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
<!DOCTYPE html>
<html lang="en-us">
<head>
<meta charset="UTF-8">
<title>Cloudcomputing by rprasanakumar</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" type="text/css" href="stylesheets/normalize.css" media="screen">
<link href='https://fonts.googleapis.com/css?family=Open+Sans:400,700' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" href="stylesheets/stylesheet.css" media="screen">
<link rel="stylesheet" type="text/css" href="stylesheets/github-light.css" media="screen">
</head>
<body>
<section class="page-header">
<h1 class="project-name">Cloudcomputing</h1>
<h2 class="project-tagline"></h2>
<a href="https://github.com/rprasanakumar/CloudComputing" class="btn">View on GitHub</a>
<a href="https://github.com/rprasanakumar/CloudComputing/zipball/master" class="btn">Download .zip</a>
<a href="https://github.com/rprasanakumar/CloudComputing/tarball/master" class="btn">Download .tar.gz</a>
</section>
<section class="main-content">
<p><!DOCTYPE html>
</p>
<pre><code><title>BOOK RECOMMENDATION SYSTEM</title>
<!-- Bootstrap core CSS -->
<link rel="stylesheet" href="./stylesheets/bootstrap.min.css">
<!-- Custom styles for this template -->
<link href="./stylesheets/stylesheet.css" rel="stylesheet">
</code></pre>
<p></p>
<p></p>
<div>
<div id="bs-example-navbar-collapse-1">
<ul>
<li>
<a href="#home">Home</a>
</li>
<li>
<a href="#team">Team</a>
</li>
<li>
<a href="#Approach">Approach</a>
</li>
<li>
<a href="#SystemImplementation">System Implementation</a>
</li>
</ul>
<pre><code> <li class="">
<a href="#Dataset">Dataset</a>
</li>
<li class="">
<a href="#Framework">Framework</a>
</li>
<li class="">
<a href="#Illustrative">Results</a>
</li>
<li class="">
<a href="#Performance">Performance</a>
</li>
<li class="">
<a href="#Accomplished">Accomplished</a>
</li>
<li class="">
<a href="#Roles">Roles</a>
</li>
</ul>
</div>
</div>
</code></pre>
<p>
</p>
<div>
<div>
<div>
<a href="index.html">
<img src="http://pngimg.com/upload/book_PNG2117.png">
</a>
</div>
</div>
<div>
<div>
<a id="home"></a>
<h3>
<a id="book-recommendationengine" class="anchor" href="#book-recommendationengine" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Book Recommendation<br>Engine</h3>
<p title="Introduction">
In present generation of Computing revolution, recommendation systems are integral part of any intelligent information systems. e.g. Search engines (Google, Bing, Yahoo), Netflix, Amazon, YouTube and so on, recommends the article or entities which might interest users. For a system to be intelligent, it needs have informative data about user and about the entities he/she was interested in. In this project, we have developed a Book recommendation engine (stand-alone) which is used to recommend books using the User profile and User rating details. The rating system is designed with two recommendation algorithms, <a href="http://infolab.stanford.edu/~ullman/mmds/ch9.pdf">1. Collaborative filtering</a> and 2. User demographic profile (User location and age).
</p>
<p>
<a href="https://github.com/rprasanakumar/CloudComputing">
Download our code from Github
</a>
</p>
<a id="team"></a>
<h3>
<a id="project-team" class="anchor" href="#project-team" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Project Team</h3>
<p>
<br><a href="https://www.linkedin.com/in/lakshmanan-ramu%0A">Lakshmanan Ramu Menal </a> <br><br><a href="https://www.linkedin.com/in/rprasanakumar">Prasanna Kumar Rajendran </a>
<br><br><a href="https://www.linkedin.com/in/senthilkumarkarthikeyan">Senthil Kumar Karthikeyan </a>
</p>
<a id="Approach"></a>
<h3>
<a id="approach" class="anchor" href="#approach" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Approach</h3>
<p>
<b> Collaborative based Recommendations: </b> Collaborative based recommendation engine is basically build based on the collaboration of different user’s contribution on a book. The parameter which is taken into consideration is the different user’s rating.
</p>
<p>
• We have grouped all the books each user has rated, for all the users, and sorted them in descending order of the ratings. We have ignored the low rating book that the user has.
<br>• Now, we get ordered pairs of interest for books each user has. So, we have dropped the user information from the pairing.
<br>• Then, we calculated the similarity between every book with every other book which are rated by the same user.
<br>• Finally, we combined the similarity score we calculated with for each book with other book we have calculated in the previous step.
<br>• We have implemented the Collaborative based recommendation engine using Hadoop MapReduce and the programming language used was Java.</p>
<pre><code> </p>
<p class="description font-light">
<span class="font-bold"> <b> <u> Recommendations filtering based on Demographic data: </u></b></span>
</p>
<p class="description font-light">
<br>• We grouped all the data in such way that we can perform clustering based on location or country the user belongs to. We used the clustering algorithms for clustering the data based on the Country.
<br>•We can even include age of the user information to cluster data. This shows the user’s recommendations which falls close to his/her country cluster. This reduced the number of data sent for processing in the second stage of engine, where the Collaborative algorithm resides. In a way improved the overall computation speed
<br>• We have implemented this using the Hadoop MapReduce, as this going to be a one time computation, in Java.
</p>
<a class="anchor" id="SystemImplementation"></a>
<h3 class="heading font-bold">System Implementation</h3>
<p class="description font-light">
<span class="font-bold"> <b> <u> Components: </u></b></span>
</p>
<p class="description font-light">
<br>• Data Preprocessing - Rating normalization,scaling, grouping age, book,country and the rating of the user in one file and removing nulls
<br>• Core System - Finds the Item-Item similarity in three scales <br>
&nbsp; &nbsp;&nbsp;&nbsp; 1. <a target="_blank" href="https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient">Correlation </a><br>
&nbsp; &nbsp;&nbsp;&nbsp; 2. <a target="_blank" href="https://en.wikipedia.org/wiki/Cosine_similarity">Cosine Similarity </a><br>
&nbsp; &nbsp;&nbsp;&nbsp; 3. <a target="_blank" href="https://en.wikipedia.org/wiki/Jaccard_index">Jaccard index </a><br>
<br>• Consolidating the results- The output from the Core system is country based and age based. The scores are based on every book with everyother book. For the user recomendation, we are processing the output from the core system and the user read books to recommend the books which he has not read.
</p>
<p class="description font-light">
<span class="font-bold"> <b> <u> System Architecture: </u></b></span>
</p>
<div class="mxgraph" style="max-width:100%;border:1px solid transparent;" data-mxgraph="{&quot;highlight&quot;:&quot;#0000ff&quot;,&quot;nav&quot;:true,&quot;resize&quot;:true,&quot;xml&quot;:&quot;&lt;mxfile userAgent=\&quot;Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36\&quot; version=\&quot;6.0.1.9\&quot; editor=\&quot;www.draw.io\&quot; type=\&quot;google\&quot;&gt;&lt;diagram name=\&quot;Page-1\&quot;&gt;&lt;/diagram&gt;&lt;/mxfile&gt;&quot;,&quot;toolbar&quot;:&quot;pages zoom layers lightbox&quot;,&quot;page&quot;:0}"></div>
</code></pre>
<pre><code> <a class="anchor" id="Motivation"></a>
<h3 class="heading font-bold">Motivation:</h3>
<span class="font-bold"> We tried to check how this recommendation engine works in the real-world scenario by giving one of our teammate’s information as a new user profile detail in the dataset and also his ratings to the books which he has already read from the dataset. Interestingly, recommendation engine suggested some books which was aligned to his interest based on the rating details which he provided. This was very useful and interesting about our project.
</p>
<a class="anchor" id="Dataset"></a>
<h3 class="heading font-bold">Dataset:</h3>
<p> <span class="font-bold"> We are using the Book-Crossing Dataset which was mined by Cai-Nicolas Ziegler, DBIS Freiburg, from the Book-Crossing Community. This dataset contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books. This dataset is freely available for research purpose from <a href="http://www2.informatik.uni-freiburg.de/~cziegler/BX/." target="_blank"> here </a>
</p>
<p>
<b><u>Data Description:</u></b>
The Book-Crossing dataset comprises 3 tables in comma-separated values (CSV) files.
<br>• <a href="http://www2.informatik.uni-freiburg.de/~cziegler/BX/BX-CSV-Dump.zip" target="_blank">BX-Users </a>
Contains the users. Note that user IDs (‘User-ID’) have been anonymized and map to integers. Demographic data is provided (‘Location’, ‘Age’) if available. Otherwise, these fields contain NULL-values.
<br>• <a href="http://www2.informatik.uni-freiburg.de/~cziegler/BX/BX-CSV-Dump.zip" target="_blank">BX-Books</a>
Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (‘Book-Title’, ‘Book-Author’, ‘Year-Of-Publication’, ‘Publisher’), obtained from Amazon Web Services. Note that in case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavors (‘Image-URL-S’, ‘Image-URL-M’, ‘Image-URL-L’), i.e., small, medium, large. These URLs point to the Amazon web site.
<br>• <a href="http://www2.informatik.uni-freiburg.de/~cziegler/BX/BX-CSV-Dump.zip" target="_blank">BX-Book-Ratings</a>
Contains the book rating information. Ratings (‘Book-Rating’) are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.
</code></pre>
<p></p>
<a id="Framework"></a>
<h3>
<a id="framework" class="anchor" href="#framework" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Framework:</h3>
<p>
The entire Recommendation Engine in this project was built in Hadoop MapReduce framework using Java.
Environments:
For development, we used Cloudera.
For testing, we used UNCC Hadoop DSBA cluster.
For implementation and demo, we used Amazon AWS Cluster.</p>
<pre><code> </p>
<a class="anchor" id="Illustrative"></a>
<h3 class="heading font-bold">Illustrative results and Examples:</h3>
<br><h4><b>Data Preprocessing<u></u></b></h4>
<br><h5><b><u>File1- After Preprocessing</u></b></h5>
<p> <b>
"100027"&&&&& &&&&&"0448425831"######-7\s&&&&&canada"~~~~~unknown
"100028"&&&&& &&&&&"185967318X"######3\s&&&&&jersey,"~~~~~unknown
"100029"&&&&& &&&&&"0140219854"######3\s"3498020862"######-5\s"0446523747"######-7\s"0312983263"######-7\s"0312954468"######-7\s&&&&&germany"~~~~~unknown
"10003"&&&&& &&&&&"068483068X"######3\s"0743446593"######3\s&&&&&usa"~~~~~"20"
"100030"&&&&& &&&&&"0744552192"######-7\s"0679735909"######-7\s&&&&&usa"~~~~~"15"</b>
</p>
<br><h4><b>Core System<u></u></b></h4>
<br><h5><b><u>File2- After Core system from Correlation based on Country</u></b></h5>
<p> <b>
australia","0006476007" "0330243829" 1.0,"0749309423" 1.0,"0733613675" 1.0,"0140031499" 1.0,"0749316063" 1.0,"0552103721" 1.0,"0552125695" 1.0,"0099245027" 1.0,
australia","0061030015" "0345466810" 1.0,"0446608815" 0.8,"0312421184" 0.7,"0224018256" 0.0,"1585672939" 0.0,"0099771519" -0.7,
australia","0142003581" "0060930535" 1.0,"0061015725" 0.7,"0224018256" -0.7,"0385319452" -1.0,"0786868716" -1.0,"0767903862" -1.0,"0671024248" -1.0,
australia","0312924801" "0446610801" 1.0,"0553258877" 1.0,"0553280341" 1.0,"0886777712" 0.6,
australia","0340613696" "000649983X" 1.0,"0450411435" 1.0,"0007140676" 1.0,"0099281082" 1.0,"0449007251" -0.9,"0860074382" -1.0,"0099312514" -1.0,
australia","0385720106" "0060934417" 1.0,"033031582" -0.7,
australia","0399131493" "014043223X" -1.0,
australia","0452285011" "0330361163" 1.0,"0140289690" 1.0,"061328125X" 1.0,"0452282152" 1.0,"1565122178" 1.0,"0790008696" 1.0,"0571206484" 0.9,"0091882087" -1.0,
australia","0575049804" "0887307876" -1.0,"0330349678" -1.0,"067976402X" -1.0,"0971880107" -1.0,</b>
</p>
<br><h5><b><u>File2- After Core system from correlation based on Age</u></b></h5>
<p> <b>
15","0440413141" "014131088X" 1.0,
"16","0312990456" "0590962736" 1.0,
"18","0553264613" "0515126772" 1.0,"0446310786" 1.0,"0380776839" 1.0,"0452281458" 1.0,
"18","0671870114" "0671737821" 1.0,"0671871005" 1.0,"0671744216" 1.0,"0671776800" 0.9,
"18","0671871005" "0515120006" 1.0,"0671737821" 1.0,"0671870114" 1.0,"0671737791" 1.0,"0671776800" 0.9,"0671744216" 0.9,
"18","0684856093" "0312195516" 1.0,"0441005993" 1.0,"0064400581" 1.0,"084233226X" 0.7,"0061057894" -1.0,"0842329250" -1.0,"0842329277" -1.0,"0553274295" -1.0,"0842329269" -1.0,
"19","0345413350" "044651652X" 1.0,
"20","0671534734" "0689813953" 1.0,"0679781587" 1.0,
"20","1558746161" "1558747613" 1.0,
"21","0060930535" "0684801523" 0.8,"0804114986" -1.0,"0316777730" -1.0,
"21","0375826688" "0971880107" 1.0,"1840720050" 0.0,
"21","0380789019" "0679781587" 1.0,
"21","0399149325" "0060931418" -1.0,"0451526341" -1.0,
"22","0099771519" "0385504209" 1.0,</b>
</p>
<br><h4><b>Consolidating Result</b></h4>
<br><h5><b><u>File2- After Core system from Correlation</u></b></h5>
<p> <b>
"101026" "0393050939",1.0,"0449908100",1.0,"0684800713",1.0,"006105111X",1.0,"042518000X",1.0,"0316037451",1.0,"0449221393",1.0,"037542217X",1.0,"0749397543",1.0,"0763604089",1.0,"0486295699",1.0,"067982412X",1.0,"0440221919",1.0,"0373250126",1.0,"0679804196",1.0,"0836220676",1.0,"0441644511",1.0,"037322592X",1.0,"0446606456",1.0,"0553573314",1.0,"068812701",1.0,"0061020400",1.0,"0385319207",1.0,"1551669358",1.0,"0451410610",1.0,
"10124" "0060254920",1.0,"0060191929",1.0,"0375412409",1.0,"0451402502",1.0,"1573229725",1.0,"0060977744",1.0,"034530358X",1.0,"037325024X",1.0,"0688105548",1.0,"0525945938",1.0,"0764225081",1.0,"0441644511",1.0,"0380707837",1.0,"0140386742",1.0,"0440224764",1.0,"0345462351",1.0,"0425151867",1.0,"0886779758",1.0,"1551669358",1.0,"0451526279",1.0,"0515124060",1.0,"0553801430",1.0,"0060977701",1.0,"0743255453",1.0,"1551669005",1.0,"0446605980",1.0,"0764227173",1.0,"0440208459",1.0,"0312288662",1.0,"0446609943",1.0,"</b>
</p>
<a class="anchor" id="Performance"></a>
<br><h3><b>Performance Evaluation:</b> </h3>
<p>
The result above shows the output of correlation score from each of the scale and more appropriate recommendations are based on Correlation scale and Cosine to some extent. Jaccard Index doesn't take the actual ratings into account. It only takes the number of similar rating between to items and number of union set elements of two items. So, as an enhancement, we are trying to merge these three scores together for better system.
</p>
<a class="anchor" id="Accomplished"></a>
<br><h3><b>Accomplished:</b> </h3>
<p>
<br>• Definitely will accomplish: As we proposed, we have successfully implemented the Book Recommendation Engine using the Collaborative + Demographic recommendation model from the scratch.
</code></pre>
<p><br>• Likely to accomplish: As we mentioned, we also tried to implement the recommendation engine using content-based recommendation engine but to time constraints we were not able to design it completely.
<br>• Would ideally like to accomplish (in future): A responsive UI for the system and showing the recommendation in a webpage and storing any new user data in the database for adding him to the existing dataset we have and include his data for further analysis and recommendation computation. </p>
<pre><code> </p>
<a class="anchor" id="Roles"></a>
<br><h4><b>Roles and Responsibilities:</b> </h4>
<p>
This project basically had Environment setup (Hadoop, SPRAK, DATA set up), Design, Coding, Testing, Documentation and setting up meetings on regular basis for project status update.
</code></pre>
<p><br><br><b><h4>
<a id="specific-task-assignments-" class="anchor" href="#specific-task-assignments-" aria-hidden="true"><span aria-hidden="true" class="octicon octicon-link"></span></a>Specific Task assignments: </h4></b>
<br><b>1. Lakshmanan Ramu Meenal</b><br>
a. Environment Setup<br>
b. Coding
<br><b>2. Prasanna Kumar Rajendran</b><br>
a. Setting up meetings on regular basis
b. Design
c. Coding
<br><b>3. Senthil Kumar Karthikeyan</b><br>
a. Documentation
b. Coding</p>
<pre><code> </p>
</div>
</div>
</div>
</code></pre>
<p></p>
<p></p>
</div>
</div>
</div>
</div>
</div>
<footer class="site-footer">
<span class="site-footer-owner"><a href="https://github.com/rprasanakumar/CloudComputing">Cloudcomputing</a> is maintained by <a href="https://github.com/rprasanakumar">rprasanakumar</a>.</span>
<span class="site-footer-credits">This page was generated by <a href="https://pages.github.com">GitHub Pages</a> using the <a href="https://github.com/jasonlong/cayman-theme">Cayman theme</a> by <a href="https://twitter.com/jasonlong">Jason Long</a>.</span>
</footer>
</section>
</body>
</html>