Datasets for Aspect-Based Sentiment Analysis in Bangla and Its Baseline Evaluation

Size: px
Start display at page:

Download "Datasets for Aspect-Based Sentiment Analysis in Bangla and Its Baseline Evaluation"

Transcription

1 data Data Descriptor Datasets for Aspect-Based Sentiment Analysis in Bangla Its Baseline Evaluation Md. Atikur Rahman * Emon Kumar Dey * Institute Information Technology, University Dhaka, Dhaka 1000, Bangladesh * Correspondence: bsse0521@iit.du.ac.bd (M.A.R.); emonkd@iit.du.ac.bd (E.K.D.) Received: 20 March 2018; Accepted: 2 May 2018; Published: 4 May 2018 Abstract: With extensive growth user interactions through prominent advances Web, sentiment analysis has obtained more focus from an academic a commercial point view. Recently, sentiment analysis in Bangla language is progressively being considered as an important task, for which previous approaches have attempted to detect overall polarity a Bangla document. To best our knowledge, re is no research on aspect-based sentiment analysis (ABSA) Bangla text. This can be described as being due to lack available datasets for ABSA. In this paper, provide two publicly available datasets to perform ABSA task in Bangla. One datasets consists human-annotated user comments on cricket, or dataset consists customer reviews restaurants. We also describe a baseline approach for subtask aspect category extraction to evaluate our datasets. Dataset: Dataset License: CC0 Keywords: ABSA dataset; Bangla ABSA; aspect extraction from Bangla 1. Summary People trust human opinion more so than traditional advertising. For example, consumers are used to seeking advice recommendation from ors before making decisions regarding important purchases. Word mouth (WOM) has always been salient for consumers when making a decision. Such referrals have a strong impact on both customer decision-making new customer acquisition for purchasing a company s product or service [1]. On or h, organizations are eager to mine all activities interactions people to underst what ir aknesses strengths are. This understing would help m to develop ir organizational strategy in this competitive world. Sentiment analysis (or opinion mining) is a process to determine viewpoint a person on a certain topic. It classifies polarity a document (i.e., review, tet, blog, or news), that is, wher communicated opinion is positive, negative, or neutral. There are three levels at which sentiment is analyzed [2]: document level, sentence level, aspect level. The document level considers that a document has an opinion on an entity, task is to classify wher an entire document expresses a positive or negative sentiment. The task at sentence level regards sentences determining wher each sentence expresses a positive, negative, or neutral opinion. Neir document level nor sentence level analysis discover exactly what people liked did not like. The aspect level (or aspect-based sentiment analysis ABSA) performs a finer-grained analysis that identifies aspects a given document or sentence sentiment expressed towards Data 2018, 3, 15; doi: /data

2 Data 2018, 3, each aspect. This level analysis is most detailed version that is capable discovering complex opinions from reviews. There are two major tasks when performing ABSA. The first is to extract specific areas or aspects mentioned in opinioned review. The second is to identify polarity (eir positive, negative, or neutral) for every aspect. For example, following review a restaurant reveals two aspects: service food. Both aspects have a positive polarity. The service was excellent food was delicious. As one can see, name aspect categories are explicitly mentioned in this review. A review might also contain implicit categories; for example, The staff makes you feel at home chicken is great. Here, same aspects, service food, are contained without being directly mentioned. Semantic Evaluation (SemEval), a reputed workshop in NLP domain, introduced a complete dataset [3] in English for ABSA task. Later this was exped to ABSA task by adding multi-lingual datasets in which eight languages over seven domains re incorporated. To perform ABSA, datasets several languages, such as Arabic [4], Czech [5], French [6], re created. There is no dataset for Bangla in field ABSA. Consequently, no work is being done to extract aspects to identify corresponding polarities for Bangla reviews. We are currently working on a project to extract aspects from a Bangla review or comments for a particular product a company, as online shopping is very popular nowadays in Bangladesh is growing rapidly. People like to buy products online after reading comments ors. In this paper, have created two new datasets that serve as a benchmark for ABSA domain in Bangla texts. We present two datasets named Restaurant. The first dataset contains 2900 comments on cricket over 5 aspect categories, second dataset contains 2600 restaurant reviews. Because re is no work in Bangla for ABSA task, have introduced ABSA by extracting aspect categories from Bangla texts in order to evaluate our datasets. We performed task with different training approaches found a satisfactory outcome compared to evaluations or languages. There are some related works from which founded idea this topic. The restaurant review dataset, provided by Ganu et al. [7], was used to improve rating predictions. Their annotations included six aspect categories overall sentence polarities. They had not prepared a complete ABSA dataset, as aspect category was present but corresponding polarity that aspect was absent. The SemEval 2014 evaluation campaign [3] extended ir dataset by adding three more fields with aspect category. They published ir dataset with four fields being contained for each review, that is, with aspect term occurring in sentences, aspect term s polarity, aspect category, aspect category s polarity. They also provided a laptop-review dataset manually annotated with similar entities as for restaurant dataset. These are benchmark datasets that [8 11] researches have used for performing ABSA task. The task was repeated in SemEval 2015 [12], for which aspect categories re combination entity type an attribute type. Multilingual datasets re released in SemEval 2016 workshop [13] on seven domains (restaurant, laptop, mobile phone, digital camera, hotel, museum) in eight languages (English, Arabic, French, Chinese, Turkish, Spanish, Dutch, Russian). A book-review dataset in Arabic language was provided by [4]. They annotated book reviews into 14 categories 4 types polarities, including Conflict. In [5], author created an IT product-review dataset for ABSA task, in which a total 2200 reviews re contained. The contribution this paper is as follows: We have collected presented two Bangla datasets for ABSA have made m publicly available. We performed statistical linguistic analysis on datasets.

3 Data 2018, 3, We implemented state---art machine learning approaches for collected datasets Data 2018, 3, x 3 11 found satisfactory accuracies Data Description Digital Bangladesh is is integral integral part part Bangladesh Bangladesh government s government s Vision Vision The Internet The isinternet growingis very growing fast over very fast country, over country, people are using people different are using online different platforms online inplatforms every aspect in every iraspect lives. This ir encouraged lives. This usencouraged to constructus datasets to construct to analyze datasets Bengali to analyze people s Bengali opinions people s to extract opinions ir sentiments to extract in ir different sentiments aspects. in different In this paper, aspects. In have this constructed paper, have two different constructed datasets, two namely, different datasets, dataset namely, Restaurant dataset dataset, to evaluate Restaurant dataset, people s to opinions. evaluate people s opinions Dataset 2.1. Dataset The dataset consists 2900 different comments from different online sources with five different The aspect dataset categories. consists 2900 Mostdifferent comments from are different collectedonline fromsources Facebook with pages five ( different aspect categories. Most comments are collected from Facebook pages Some ( comments are collected from two popular Bengali Websites, BBC Bangla ( bengali), Some comments Daily are Prothom collected Alo ( from two popular Bengali This dataset Websites, was collected BBC Bangla by authors ( this paper. The comments are Daily different Prothom lengths Alo ( each review contains approximately This dataset Bangla was collected words. The by reasons authors behind this choosing paper. se The comments Websites for are collecting different data lengths are given below: each review contains approximately Bangla words. The reasons behind choosing se Websites for collecting BBC Bangla data are given below: Daily Prothom Alo are very popular online news sites for Bengali community BBC Bangla all around Daily world. Prothom They Alo are are popular very popular for publishing online news trustworthy sites for auntic Bengali news. community Bengali all people around frequently world. read They are news popular sometimes for publishing make trustworthy comments to share auntic ir opinion. news. Bengali Although people people frequently write ir read comments news or opinions sometimes in both make Bangla comments English, to share most ir opinion. time, y Although choose people Bangla. write Weir studied comments different or opinions articles in both found Bangla that in almost English, 90% most cases, time, people y expressed choose Bangla. ir opinion We studied in Bangla. different articles found that in almost 90% The cases, Facebook people expressed page Prothom ir opinion Alo has in Bangla. over 13 million follors, BBC Bangla has over 11 The million. Facebook These page two pages Prothom provide Alo enormous has over 13 text million postsfollors, as ll as a large BBC number Bangla has comments. over 11 million. isthese one two pages mostprovide popularenormous games nowadays text posts for as Bengali ll as a people. large number We found comments. that people are moreis interested one in most making popular comments games nowadays on cricket-related for Bengali news people. than We on found any that or people topic. Thus, are more chose interested this category in making for comments our experiment. on cricket-related news than on any or topic. Thus, chose this category for our experiment. Table 1 shows an example comments collected from Facebook pages. Table 1 shows an example comments collected from Facebook pages. Table 1. Example cricket-related comments on Prothom Alo BBC Bangla Facebook pages. Table 1. Example cricket-related comments on Prothom Alo BBC Bangla Facebook pages. Comments ম শর ফ এক জ দ ক র ন ম য ন মট ন লই মন ভ র য য় আম দর জ হর,জনসন, প লক, ট ল ন ই ত ব এক জন ম শর ফ আ ছ ব ল র দরও দ ষ নই, দ ষট 100% ম নজ ম র পস ব ল র দর জন উই কট ত র ত ত র ই ক র ন দ ষট ম নজ ম ন ম ন ল আ ম নব আশ ক র ত স কন অ ত ত দ ল ফর ব আর নয় মত খল ব এব চ র আউট কর ব এখন দশ ক দর ক ছ জন য় হ ট /ওয় ন ড ম ন ষ এখন দখ তই চ য়ন.. হ র লও ব ল দশ জত লও ব ল দশ আগ ম ত আব র আমর ই জত ব ইশ! আম দর য দ বর ট ক হ লর ম ত একট ব ট ন থ ক ত আম র পর মশ হ ল কট র দর ক প রট জগত থ ক দ র র খ ত হ ব Source Prothom Alo Facebook page Prothom Alo Facebook page Prothom Alo Facebook page Prothom Alo Facebook page BBC Bangla Facebook page BBC Bangla Facebook page People People usually usually comment comment in in Bangla Bangla about about news. news. We We also also found found that that 5 10% 5 10% time, time, y y commented in English also wrote Bangla sentences written in English alphabet. We did not commented in English also wrote Bangla sentences written in English alphabet. We did not consider se opinions in our dataset. In addition, some comments had only emoticons no or consider se opinions in our dataset. In addition, some comments had only emoticons no text or words. We also omitted se for our dataset. All processes re done manually by or text or words. We also omitted se for our dataset. All processes re done manually authors. The following section describes annotation process collected corpus cricketrelated comments. by authors. The following section describes annotation process collected corpus cricket-related comments Annotation Dataset The Bangla text on cricket was annotated jointly by authors, a group second-year students BSSE, two employees from Institute Information Technology, University Dhaka,

4 Data 2018, 3, Annotation Dataset The Bangla text on cricket was annotated jointly by authors, a group second-year students BSSE, two employees from Institute Information Technology, University Dhaka, Bangladesh. All participants agreed to categorize whole dataset into five different aspect categories. These Data Data 2018, 2018, re3, 3, bowling, x batting, team, team management, or. Given a comment, task 12 annotators was to recommend aspect category polarity labels for each. Three types Bangladesh. polarities re All considered, participants thatagreed agreed is, positive, to to categorize categorize negative, whole whole neutral. dataset dataset Tableinto into 2 shows five five different different information aspect aspect about categories. These re bowling, batting, team, team, team team management, or. or. Given Given a comment, a comment, participants. task annotators was to to recommend aspect aspect category polarity polarity labels labels for each. for each. Three Three types types polarities re considered, that that is, is, positive, negative, neutral. neutral. Table Table 2 shows 2 shows information information Table 2. Information about participants in data collection. about participants. Participant ID Gender Pression Task Table Information about about participants in data in data collection. collection. P1 Male MS student/author Data collection () annotation P2 Participant ID ID Male Gender Pression Faculty/author Pression Task Data collection Task () annotation P1 P3 P1 Male Male Male MS MS student/author Graduate student/author Data Annotation Data collection collection () () annotation translation annotation (Restaurant) P2 P4 P2 Male Female Male Faculty/author Graduate Faculty/author Data student Annotation Data collection collection () () annotation translation annotation (Restaurant) P3 Male Graduate student Annotation () translation (Restaurant) P5 P3 Female Male Graduate student Annotation () () translation translation (Restaurant) (Restaurant) P4 Female Graduate student Annotation () translation (Restaurant) P6 P4 Male Female Graduate student Annotation Annotation () () translation translation (Restaurant) (Restaurant) P5 Female Graduate student Annotation () translation (Restaurant) P7 P5 Male Female Graduate student Annotation Annotation () () translation translation (Restaurant) (Restaurant) P6 Male Graduate student Annotation () translation (Restaurant) P8 P6 Male Male Graduate student Annotation Annotation () () translation translation (Restaurant) (Restaurant) P7 Male Graduate student Annotation () translation (Restaurant) P9 Female Accountant Annotation P8 P7 Male Male Graduate Graduate student student Annotation Annotation () () translation translation (Restaurant) (Restaurant) P10 Male Officer Annotation P9 P8 Female Male Graduate Accountant student Annotation () Annotation translation (Restaurant) P10 P9 Male Female Accountant Officer Annotation Annotation P10 Male Officer Annotation Each participant categorized every comment dataset. We applied majority voting technique Each to participant make final categorized decision every about comment aspect category dataset. We applied polarity majority a sentence. voting As an technique Each to participant make final categorized decision every about comment aspect category dataset. We polarity applied a sentence. majority As an voting example, have taken following comment: technique example, to have make taken final following decision comment: about aspect category polarity a sentence. As an example, have taken এই following প চ র ন কর comment: ট ফ, ব ল ন স হ ভ ল হ য় ছ The The voting voting result result found এই প চ for for র ন comment comment কর ট ফ, is ব ল is given given ন স হ in Table in Table 3. ভ ল 3. হ য় ছ The voting result found for comment is given in Table 3. Table 3. Voting example to define category polarity. Table 3. Voting example to define category polarity. Comment: Table 3. এই Voting প চ example র ন কর ট ফ, to define ব ল ন স হ category ভ ল হ য় ছ polarity. Participant Voting for Category Voting for Polarity Comment: এই প চ র ন কর ট ফ, ব ল ন স হ ভ ল হ য় ছ P1 Bowling Positive Participant Voting for Category Voting for Polarity Participant P2 Voting Bowling for Category Positive Voting for Polarity P1 Bowling Positive P1 P3 Batting P2 Bowling Bowling Positive Positive P2 P4 Batting P3 Batting Bowling Positive P5 Or Neutral P3 P6 P4 Bowling Batting Batting Positive P4 P7 P5 Or Or Batting Neutral Neutral P5 P8 P6 Bowling Bowling Or Positive Positive Neutral P6 Bowling Positive P9 P7 Bowling Or Positive Neutral P7 Or Neutral P10 P8 Batting Bowling Positive P8 Bowling Positive P9 Bowling Positive P9 Bowling Positive From Table 3, can see P10 that comments Batting had three votes for P10 Batting Batting with a negative polarity, two votes for Or with a neutral polarity, four votes for Bowling with a positive polarity. Thus, our method From Table determined 3, can this see comment that as comments being in had Bowling three votes category for Batting with a positive with a negative polarity. polarity, We two also From votes had Table ties for for Or 3, some with can comments. a see neutral thatin polarity, this comments situation, four hadtook votes three both for votes Bowling categories for Batting with with a positive with ir a negative polarity. in polarity, Thus, twour votes dataset. method fortable determined Or 4 shows withthis an neutral example comment polarity, for as this being scenario. in four Bowling votes category for Bowling with with a positive a positive polarity. polarity. We Thus, also our had method ties for determined some comments. this comment In this situation, as being intook Bowling both categories category with air positive polarity polarity. in Weour alsodataset. had ties Table for some 4 shows comments. an example In for this this situation, scenario. took both categories with ir polarity in our dataset. Table 4 shows an example for this scenario.

5 Data 2018, 3, Data 2018, 3, x 5 11 Table 4. Voting category identification. Data 2018, 3, x Table 4. Voting category identification Comment: Table ওর Voting ক র ছ, category ত মর identification. 100 কর ত প র ব ন? Participant Voting for Voting for Polarity Comment: ওর Voting 200 ক র ছ, for Category ত মর 100 কর ত Voting প র ব ন? for Polarity Participant P1 P1 Voting Batting for Batting Category Voting for Polarity P2 P2 P1 Team Batting Team P3 P3 P2 Team Batting P4 P4 P3 Batting Batting P5 Batting P5 P4 Batting P6 Team P6 P5 Team Batting P7 Team P8 P7 P6 Team Batting P9 P8 P7 Batting Team Team P10 P8 Batting Team P9 Team P9 Team P10 Team P10 Team We can see from Table 4 that 50% evaluators voted for Batting, 50% votes re for We Team can We see category, can from see from Table both Table with 4 that 4 that a negative 50% 50% polarity. evaluators As y voted had for for tied, Batting, Batting, our algorithm 50% 50% took votes both votes re se re categories for for Team inteam category, labeled category, both dataset both with with a a anegative polarity. As y had had tied, tied, our our algorithm took took both both se We se also categories faced in anor in labeled kind dataset problem with a to negative construct polarity. dataset. After calculating category, found We also We dissimilarities also faced faced anor anor forkind some kind comments problem to among construct participants dataset. After After regarding calculating polarity category, category, comment. found found For example, dissimilarities when for for some some considered comments among following participants comment, regarding found voting polarity polarity result given comment. comment. For For example, example, when when considered considered following comment, comment, found found voting voting result in Table 5. result given in Table 5. given in Table 5. Table 5. Problem related to polarity determination. Table 5. Problem related to polarity determination. Table 5. Problem related to polarity determination. Comment: র ক ব ড় হ য় গ ছ, খল প রন ত হ ল আজ কভ ব ক কর ল? Comment: Participant র ক ব ড় হ য় Voting গ ছ, for খল Category প রন ত হ ল Voting আজ কভ ব for Polarity ক কর ল? Participant Voting for Category Voting for Polarity Participant P1 Voting Bowling for Category Voting Positive for Polarity P1 P2 P1 Bowling Bowling Positive Positive P2 Bowling Positive P2 P3 Bowling Team P3 Team Positive P3 P4 Bowling Team P4 Bowling P5 P4 P5 Or Bowling Or Positive Positive P6 P5 P6 Team Or Team Positive P7 P7 Or Or P6 P8 Team P8 Team Team P7 P9 P9 Or Bowling Positive P8 P10 P10 Team Team P9 Bowling Positive From Table P10 5, can see that, for Team comment, both Bowling Team categories had four From Table 5, can see that, for comment, both Bowling Team categories had four votes. Thus, keep both se categories in our annotated dataset. In polarity, re re three votes. Thus, keep both se categories in our annotated dataset. In polarity, re re three From positive Table votes 5, one can negative see that, vote for for bowling. comment, Thus, both considered Bowling it as having Team a positive categories polarity positive had four votes added it to one our negative dataset. Table vote6 for shows bowling. a sample Thus, labeled considered it dataset. as having a positive polarity votes. added Thus, it to our keep dataset. both Table se 6 categories shows a sample in our annotated dataset. In dataset. polarity, re re three positive In Table votes 7, one summary negative vote complete for bowling. Thus, dataset considered is presented. it as We having can see a positive that re polarity are a total added 3034 it different to our dataset. comments Table with 6 shows five different a sample categories, labeled that is, Batting, dataset. Bowling, Team, Team Management, Or. Table Each6. A part categories contains dataset threein different xlsx format. polarities: positive, negative, neutral. For example, Batting category contains a total 583 comments, for which 138 are positive, 389 are বয প র negative, ন eট ধ ম t 56 ঘর টন areছ ড় neutral কছ ন polarity. The Bowling category or contains 154neutral positive, 145 negative, 33 neutral ভক মন polarity ট iগ র দর comments. জনয The Team, Team Management, team Or categories positive contain totals ব ল দশ 774, 332, eখ ন ত ম মর 1013 য গয comments, o পন র প ল respectively. ন batting negative ব ল দশ হ র ব আজ team negative স তজন sন র ন য় ম ঠ ব তল ট ন ন য য় ময চ জত য য় ন bowling negative ট নর পচ ব ন য় ঔষধ খ জ ল ভ ক ব u n পচ ব ন লi হয় team management negative eট প র দ ম ফ k eকট খল হi স team negative জয় ধ সম য়র a পk or positive যi জ য়র সম ন!! or neutral

6 Data 2018, 3, Data 2018, 3, x 6 12 Table 6. A part dataset in xlsx format. বয প র ন eট ধ ম t ঘর টন ছ ড় কছ ন or neutral ভক মন ট iগ র দর জনয team positive ব ল দশ eখ ন ত ম মর য গয o পন র প ল ন batting negative ব ল দশ হ র ব আজ team negative স তজন sন র ন য় ম ঠ ব তল ট ন ন য য় ময চ জত য য় ন bowling negative ট নর পচ ব ন য় ঔষধ খ জ ল ভ ক ব u n পচ ব ন লi হয় team management negative eট প র দ ম ফ k eকট খল হi স team negative জয় ধ সম য়র a পk or positive যi জ য়র সম ন!! or neutral ব ল রর য প রম ন শটর বল দ c- ত ত র ন কত ব শ হয় সট i দখ র বষয়! bowling negative ত ক টs আর o ডআi দ ল নয় মত চ i team positive ব ল দশ k কট আ র e গ য় য ব, o পন র দর eকট ভ ল কর ত হ ব team positive ব ল দশ k কট আ র e গ য় য ব, o পন র দর eকট ভ ল কর ত হ ব batting negative ফ রi চমক দখ লন র j ক bowling positive নব ন দর স য গ দয় দরক র. or neutral ব ল পচ ত ব আম দর বয টসময ন দর আuট ল আtহতয ছ ড় আর কছ i নয় batting negative ব ল পচ ত ব আম দর বয টসময ন দর আuট ল আtহতয ছ ড় আর কছ i নয় bowling neutral দ য়tj ন হ নত র aভ ব? or negative In Table 7, summary Table 7. The complete complete statistics dataset is presented. dataset. We can see that re are a total 3034 different comments with five different categories, that is, Batting, Bowling, Team, Team Management, Or. Each categories contains Polarity three different polarities: positive, negative, neutral. For Category example, Batting category contains a total 583 comments, for Total which 138 are Positive Neutral positive, 389 are negative, 56 are neutral polarity. The Bowling category contains 154 Batting positive, 145 negative, 33 neutral polarity comments. The Team, Team Management, Or Bowling categories contain Team totals 774, 332, comments, 502 respectively Team Management Or Table 7. The 89complete statistics 828 dataset Total Comments Polarity 3034 Category Total Positive Neutral Analysis Proposed Batting Dataset Bowling We used Zipf s law [14] for our proposed dataset. Zipf s law is a statement-based Team observation that states that Team inmanagement a collection data, 24 frequency a given332 word should be inversely proportional to its rank in Or corpus. The word89 that is 828 most frequent scores rank 1 in a dataset should occur approximately twice as Total second Comments most frequent word, three 3034 times as third most frequent, so on. Figure 1 shows diagram in which plotted words our dataset. The plot follows Analysis trend Proposed Zipf s law. Dataset We also calculated reliability annotation process. The valuewe used intraclass Zipf s law correlation [14] for our (ICC) proposed was dataset. Zipf s law is a statement-based observation that states that in a collection data, frequency a given word should be inversely 2.2. Restaurant proportional Dataset to its rank in corpus. The word that is most frequent scores rank 1 in a dataset To should create occur approximately Bangla Restaurant twice as dataset, second most took frequent help directly word, three from times English as third benchmark s most frequent, so on. Figure 1 shows diagram in which plotted words our dataset. Restaurant dataset [3]. All comments re abstractly translated into Bangla with ir exact annotation. The plot follows trend Zipf s law. We also calculated reliability annotation process. The original English dataset contains a total 2800 different comments. Participants from same The value intraclass correlation (ICC) was group involved in dataset s creation re involved in translation process Restaurant dataset, except participants P9 P10. We divided original dataset equally into eight parts distributed se to participants. They translated ir assigned parts original English dataset abstractly. Finally, participants P1 P2 merged separate sections performed an extensive proread.

7 Data 2018, 3, Figure Distribution word frequencies dataset using Zipf s law. law. Annotation Schema for Restaurant The Restaurant reviews [3] dataset used in this paper was abstractly translated into Bangla. There re five types aspect categories, that is, Food, Price, Service, Ambiance, Miscellaneous. As objective was to identify aspect category corresponding polarity, participants did not add aspect terms or ir polarities. In terms polarity an aspect category, considered only three polarity labels, that is, positive, negative neutral. The original dataset consisted four different polarity labels: positive, negative, neutral, conflict. In our translated Bangla dataset, omitted conflict category assumed it to be same as neutral category. The annotators re asked to assign each translated Bangla restaurant review into ir categories ir polarities for original dataset. Table 8 shows a sample translated Restaurant dataset. Data 2018, 3, x 8 12 Table Table A part part Restaurant dataset in inxlsx xlsxformat. খ ব স মত আসন আ ছ eব খ দয প oয় র জনয য থ a পk কর ত হ ব ambience negative খ ব স মত আসন আ ছ eব খ দয প oয় র জনয য থ a পk কর ত হ ব service negative Figure 2. Word দ ম frequency ত লন ম লকভ ব Bangla কম Restaurant dataset according to Zipf s law. price positive 3 i ছল মজ দ র food positive য দo খ ব র ট চমৎক র ছল, e ট সs ছল ন food positive য দo খ ব র ট চমৎক র ছল, e ট সs ছল ন price negative খ ব ভ ল! miscellaneous positive আচ রর স য জন খ ব ভ ল ছল food positive ধ ম t র n i য সর ত নয়, সব সবসময় ম ন য গ eব ভ ল হ য় ছ food positive ধ ম t র n i য সর ত নয়, সব সবসময় ম ন য গ eব ভ ল হ য় ছ service positive সবর দ eক ট স nর ভড়, কn ক ন ক ল হল নi ambience positive সj alsl eব প র র - ব n ব pশ স কর কছ i নi ambience neutral আ ম ন ত য আম ক ব রবর ফ র য ত হ ব,!!! miscellaneous positive সm বত e ট eক ট ছ ট আর মদ য়ক রs রn,ভ ল সj র স র ম nক aন ভ ত ambience positive য দo খ দয ভ ল ছল প র বষন ছল ব food positive য দo খ দয ভ ল ছল প র বষন ছল ব service negative Data 2018, 3, x; do i: কম র ম ন য গ eব বn tপ ণর service om/journal/data positive খ ব র ভ ল ছল food positive Table 9 shows complete statistics Bangla Restaurant dataset. We can see from table Table that 9five shows different complete categories, statistics that is, Food, Price, Bangla Service, Restaurant Ambiance, dataset. Miscellaneous, We can see contained from table that five713, different 178, 336, categories, 234, 613 that reviews, is, Food, respectively, Price, Service, with three Ambiance, different polarities. Miscellaneous, For example, contained 713, 178, 336, category 234, Food 613contained reviews, 500 respectively, positive, 126 with negative, three different 87 neutral polarities. sentiment For labels. example, The Service category Food contained category contained 500 positive, 186 positive, 126 negative, 118 negative, neutral sentiments. labels. We also The found Service that this category Restaurant dataset also follod Zipf s law, which is shown in Figure 2. Table 9. Complete statistics Bangla Restaurant dataset. Polarity Category Total Positive Neutral Food

8 Data 2018, 3, contained 186 positive, 118 negative, 32 neutral sentiments. We also found that this Restaurant dataset also follod Zipf s law, which is shown in Figure 2. Category Table 9. Complete statistics Bangla Restaurant dataset. Polarity Positive Neutral Total Food Price Service Ambiance Figure 1. Distribution word frequencies dataset using Zipf s law. Miscellaneous Baseline Evaluation Figure Word frequency Bangla Restaurant dataset accordingto to Zipf s law. law. Our objective is to provide benchmark datasets for Bangla ABSA. Our datasets are designed for two major tasks ABSA. These are aspect category extraction identification polarity for each aspect category. In this paper, experimented with first subtask, that is, extraction aspect category. We applied three major steps to extract aspect category. Firstly, preprocessing was performed on dataset. After this, extracted features from data finally performed classification using some popular classification models Preprocessing Feature Extraction In preprocessing phase, each Bangla document was represented as a bag words. We applied traditional preprocessing steps for evaluation. Firstly, punctuations stop words Data 2018, 3, x; do i: om/journal/data re removed from each comments. After this, removed digits from our dataset, because found that digits re not necessary for aspect category. Finally, tokenized each Bangla word from our dataset. Thus, a vocabulary Bangla words was prepared after preprocessing. We created a feature matrix for which each review was represented by a vector that vocabulary. Term frequency inverse document frequency (TF IDF) was used for calculating features Results In training phase, extracted feature sets re trained by popular supervised machine learning algorithms. Because this was a multi-label classification problem, trained our models

9 Data 2018, 3, by setting up multi-label output. We used linear SVC in support vector machine (SVM) implementation. The following machine learning algorithms re used: I. Support Vector Machine (SVM) II. III. Rom forest (RF) K-nearest neighbor (KNN) After training was completed, our proposed Bangla test dataset was executed on trained model. The result is shown in following table figure. Table 10 shows results for task aspect category extraction datasets have presented in this paper. We can see that using SVM, obtained highest precision rate for both datasets. Both datasets shod a low recall F1-score. Figure 3 shows overall accuracy models using our datasets. The inherent nature datasets is reason behind lor performance models for both datasets. People share ir opinion with ir individual judgment. Therefore, variety opinions in datasets is much larger. On or h, aspect extraction is a multi-label classification problem. One s opinion might have multiple aspect categories. Conventional classifiers miss some se aspect categories. Table 10. Performance proposed datasets. Dataset Model Precision Recall F1-Score Restaurant SVM RF KNN SVM RF KNN Data 2018, 3, x Accuracy SVM RF KNN SVM RF KNN Restaurant Precision Recall F1-score Figure 3. The result three models our datasets. Figure 3. The result three models our datasets. 4. Conclusions Future Work Two datasets are provided for ABSA Bangla text. These datasets have been designed to perform two tasks covering aspect category extraction identification polarity for that aspect category. We also report baseline results to evaluate task aspect category extraction. As future plans, aim to enhance our work by including furr domains such as cars, mobiles, laptops. We are working on more advanced methods for ABSA Bangla text using our datasets to achieve better performance. These results can be improved if process train datasets in a more sophisticated way. In this work, have taken all vocabulary as features for evaluation after removing punctuation, stop words, digits. Some state---art techniques for information gain can be applied to dataset before classification after preprocessing steps to attain better results. 4. Conclusions Future Work Author Contributions: All authors contributed equally to this work, have read approved final manuscript. Two datasets are provided for ABSA Bangla text. These datasets have been designed to Conflicts Interest: The authors declare no conflict interest. perform two tasks covering aspect category extraction identification polarity for that aspect References category. We also report baseline results to evaluate task aspect category extraction. 1. Trusov, M.; Bucklin, R.E.; Pauls, K. Effects word--mouth versus traditional marketing: Findings from an internet social networking site. J. Mark. 2009, 73, Jeyapriya, A.; Selvi, C.K. Extracting Aspects Mining Opinions in Product Reviews Using Supervised Learning Algorithm. In Proceedings nd International Conference on Electronics Communication Systems (ICECS), Coimbatore, India, February Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manhar, S. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. Available Online:

10 Data 2018, 3, As future plans, aim to enhance our work by including furr domains such as cars, mobiles, laptops. We are working on more advanced methods for ABSA Bangla text using our datasets to achieve better performance. Author Contributions: final manuscript. All authors contributed equally to this work, have read approved Conflicts Interest: The authors declare no conflict interest. References 1. Trusov, M.; Bucklin, R.E.; Pauls, K. Effects word--mouth versus traditional marketing: Findings from an internet social networking site. J. Mark. 2009, 73, [CrossRef] 2. Jeyapriya, A.; Selvi, C.K. Extracting Aspects Mining Opinions in Product Reviews Using Supervised Learning Algorithm. In Proceedings nd International Conference on Electronics Communication Systems (ICECS), Coimbatore, India, February Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manhar, S. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. Available online: (accessed on 3 May 2018). 4. Al-Smadi, M.; Qawasmeh, O.; Talafha, B.; Quwaider, M. Human Annotated Arabic Dataset Book Reviews for Aspect Based Sentiment Analysis. In Proceedings rd International Conference on Future Internet Things Cloud (FiCloud), Rome, Italy, August Tamchyna, A.; Fiala, O.; Veselovská, K. Czech Aspect-Based Sentiment Analysis: A New Dataset Preliminary Results. Available online: (accessed on 3 May 2018). 6. Apidianaki, M.; Tannier, X.; Richart, C. Datasets for Aspect-Based Sentiment Analysis in French. Available online: (accessed on 3 May 2018). 7. Gayatree, G.; Elhadad, N.; Marian, A. Beyond Stars: Improving Rating Predictions Using Review Text Content. Available online: type=pdf (accessed on 3 May 2018). 8. Kiritchenko, S.; Zhu, X.; Cherry, C.; Mohammad, S. NRC-Canada-2014: Detecting Aspects Sentiment in Customer Reviews. Available online: (accessed on 3 May 2018). 9. Kiritchenko, S.; Zhu, X.; Cherry, C.; Mohammad, S. Supervised Unsupervised Aspect Category Detection for Sentiment Analysis with Co-cccurrence Data. In IEEE Transactions on Cybernetics; IEEE: Piscataway, NJ, USA, Soujanya, P.; Cambria, E.; Gelbukh, A. Aspect extraction for opinion mining with a deep convolutional neural network. Knowl.-Based Syst. 2016, 108, Pengfei, L.; Joty, S.; Meng, H. Fine-Grained Opinion Mining with Recurrent Neural Networks Word Embeddings. Available online: (accessed on 3 May 2018). 12. Pontiki, M.; Galanis, D.; Papageorgiou, H.; Manhar, S.; Androutsopoulos, I. Semeval-2015 Task 12: Aspect Based Sentiment Analysis. Available online: (accessed on 3 May 2018). 13. Pontiki, M.; Galanis, D.; Papageorgiou, H.; Androutsopoulos, I.; Manhar, S.; AL-Smadi, M.; Al-Ayyoub, M.; Zhao, Y.; Qin, B.; De Clercq, O.; et al. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. Available online: (accessed on 3 May 2018). 14. Pak, A.; Paroubek, P. Twitter as A Corpus for Sentiment Analysis Opinion Mining. Available online: (accessed on 3 May 2018) by authors. Licensee MDPI, Basel, Switzerl. This article is an open access article distributed under terms conditions Creative Commons Attribution (CC BY) license (

Thursday, October 22, 2015, Kartik 7, 1422 BS, Muharram 8, 1437 Hijr

Thursday, October 22, 2015, Kartik 7, 1422 BS, Muharram 8, 1437 Hijr Thursday, October 22, 2015, Kartik 7, 1422 BS, Muharram 8, 1437 Hijr 35 Bangladeshis at El Paso detention centre in US end hunger strike Observer Online Desk Published :Thursday, 22 October, 2015, Time

More information

DR. MAIDUL ISLAM BA (Calcutta), MA (JNU), MPhil (JNU), DPhil (Oxon) University Education

DR. MAIDUL ISLAM BA (Calcutta), MA (JNU), MPhil (JNU), DPhil (Oxon) University Education October 2018 DR. MAIDUL ISLAM BA (Calcutta), MA (JNU), MPhil (JNU), DPhil (Oxon) Present Position: Assistant Professor of Political Science, Centre for Studies in Social Sciences, Calcutta. Institutional

More information

(SUSANTA GHOSH) Circle Secretary. Ref: AIBDPA/1/Mtg/01 Date:

(SUSANTA GHOSH) Circle Secretary. Ref: AIBDPA/1/Mtg/01 Date: Ref: AIBDPA/1/Mtg/01 Date: 06.12.12 A meeting of the Circle office bearers will be held on 13 th December 2012 at CTO Union office at 2pm to discuss the subjects as contained in the following agenda. All

More information

Introduction to the Transfer of Property Laws of Bangladesh

Introduction to the Transfer of Property Laws of Bangladesh Chapter-One Introduction to the Transfer of Property Laws of Bangladesh The word property is the outcome of human civilization. In the early stage of human civilization man had no idea of property or properties

More information

투표시유의사항. Voting on Election Day 在选举日当天 ন র ব চন র দ ন ভ ট প রদ ন

투표시유의사항. Voting on Election Day 在选举日当天 ন র ব চন র দ ন ভ ট প রদ ন Voting on Election Day 1. If this is your first time voting: most voters provide a Social Security number or driver s license when registering to vote. If you did not, you need to bring proof of identification

More information

Political Science Syllabus for Three Year Degree Course (Semester Pattern) (Honours and General) w.e.f Honours

Political Science Syllabus for Three Year Degree Course (Semester Pattern) (Honours and General) w.e.f Honours Political Science Syllabus for Three Year Degree Course (Semester Pattern) (Honours and General) w.e.f 2014-2015 Honours First Semester (July to December) Paper-PLSH-101 (50 marks) Western Political Thought.

More information

Final Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I)

Final Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I) University of Calcutta Final Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I) Core Courses [Fourteen courses; Each course: 6 credits (5 theoretical segment+ 1 for tutorial-related

More information

Public Disclosure Authorized. Public Disclosure Authorized. Public Disclosure Authorized. Public Disclosure Authorized

Public Disclosure Authorized. Public Disclosure Authorized. Public Disclosure Authorized. Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized Ministry of Planning Air Vice Marshal (Retd.) A K Khandker Minister Government of the

More information

University of Calcutta. Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I)

University of Calcutta. Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I) University of Calcutta Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I) A. Core Courses [Fourteen courses; Each course: 6 credits (5 theoretical segment+ 1 for tutorial-related segment).

More information

Bangladesh Women and Children Repression Prevention Act of 2000

Bangladesh Women and Children Repression Prevention Act of 2000 Bangladesh Women and Children Repression Prevention Act of 2000 ও ঠ আই ও ঠ ও ; আই ই :- ১ ও ঠ আই ও ঠ ও ; ২ আই ই :- ২ ছ আই,- ( ) ই আই ; (খ) ই ঝ ই ই ই ; ( ) আট ই আট ই খ ; (ঘ) ই ই আই ই ; (ঙ) ৯, Penal Code,

More information

THE ASSAM GAZETTE অস ধ ৰণ

THE ASSAM GAZETTE অস ধ ৰণ পঞ জ ভ ক ত নম বৰ- ৭৬৮ ৯৭ Registered No.768/97 অসম ৰ জপত র THE ASSAM GAZETTE অস ধ ৰণ EXTRAORDINARY প র প ত কর ত ত ত বৰ দ ব ৰ প রক শ ত PUBLISHED BY AUTHORITY ন 118 দ শপ ৰ শদনব ৰ 16 জ ন 2001 26 জজঠ 1923 (শক)

More information

Tracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene

Tracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene Tracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene Diego Tumitan, Karin Becker Instituto de Informatica - Universidade Federal do Rio Grande do Sul, Brazil

More information

Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info

Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info Ms. Ashwini Gharde 1, Mrs. Ashwini Yerlekar 2 1 M.Tech Student, RGCER, Nagpur Maharshtra, India 2 Asst. Prof, Department of Computer

More information

Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining

Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining G. Ritschard (U. Geneva), D.A. Zighed (U. Lyon 2), L. Baccaro (IILS & MIT), I. Georgiu (IILS

More information

arxiv: v2 [cs.si] 10 Apr 2017

arxiv: v2 [cs.si] 10 Apr 2017 Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter Zhiwei Jin 1,2, Juan Cao 1,2, Han Guo 1,2, Yongdong Zhang 1,2, Yu Wang 3 and Jiebo Luo 3 arxiv:1701.06250v2 [cs.si] 10

More information

Towards Tackling Hate Online Automatically

Towards Tackling Hate Online Automatically Towards Tackling Hate Online Automatically Nikola Ljubešić 1, Darja Fišer 2,1, Tomaž Erjavec 1 1 Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana 2 Department of Translation, University

More information

Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump

Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump ABSTRACT Siddharth Grover, Oklahoma State University, Stillwater The United States 2016 presidential

More information

Subjectivity Classification

Subjectivity Classification Subjectivity Classification Wilson, Wiebe and Hoffmann: Recognizing contextual polarity in phrase-level sentiment analysis Wiltrud Kessler Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

More information

Research and strategy for the land community.

Research and strategy for the land community. Research and strategy for the land community. To: Northeastern Minnesotans for Wilderness From: Sonia Wang, Spencer Phillips Date: 2/27/2018 Subject: Full results from the review of comments on the proposed

More information

Natural Language Technologies for E-Rulemaking. Claire Cardie Department of Computer Science Cornell University

Natural Language Technologies for E-Rulemaking. Claire Cardie Department of Computer Science Cornell University Natural Language Technologies for E-Rulemaking Claire Cardie Department of Computer Science Cornell University An E-Rulemaking Scenario Summarize the public commentary regarding the prohibition of potassium

More information

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling

An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling Deqing Yang, Yanghua Xiao, Hanghang Tong, Junjun Zhang and Wei Wang School of Computer Science Shanghai Key Laboratory of Data Science

More information

Popularity Prediction of Reddit Texts

Popularity Prediction of Reddit Texts San Jose State University SJSU ScholarWorks Master's Theses Master's Theses and Graduate Research Spring 2016 Popularity Prediction of Reddit Texts Tracy Rohlin San Jose State University Follow this and

More information

Automated Classification of Congressional Legislation

Automated Classification of Congressional Legislation Automated Classification of Congressional Legislation Stephen Purpura John F. Kennedy School of Government Harvard University +-67-34-2027 stephen_purpura@ksg07.harvard.edu Dustin Hillard Electrical Engineering

More information

Experiments on Data Preprocessing of Persian Blog Networks

Experiments on Data Preprocessing of Persian Blog Networks Experiments on Data Preprocessing of Persian Blog Networks Zeinab Borhani-Fard School of Computer Engineering University of Qom Qom, Iran Behrouz Minaie-Bidgoli School of Computer Engineering Iran University

More information

THE GOP DEBATES BEGIN (and other late summer 2015 findings on the presidential election conversation) September 29, 2015

THE GOP DEBATES BEGIN (and other late summer 2015 findings on the presidential election conversation) September 29, 2015 THE GOP DEBATES BEGIN (and other late summer 2015 findings on the presidential election conversation) September 29, 2015 INTRODUCTION A PEORIA Project Report Associate Professors Michael Cornfield and

More information

A Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media

A Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media Proceedings of IOE Graduate Conference, 2017 Volume: 5 ISSN: 2350-8914 (Online), 2350-8906 (Print) A Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media Mandar Sharma

More information

Introduction to Text Modeling

Introduction to Text Modeling Introduction to Text Modeling Carl Edward Rasmussen November 11th, 2016 Carl Edward Rasmussen Introduction to Text Modeling November 11th, 2016 1 / 7 Key concepts modeling document collections probabilistic

More information

The IWSLT 2015 Evaluation Campaign

The IWSLT 2015 Evaluation Campaign The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany Sebastian Stüker, KIT, Germany Luisa Bentivogli, FBK, Italy Roldano Cattoni, FBK, Italy Marcello Federico, FBK-irst,

More information

Media coverage in times of political crisis: a text mining approach

Media coverage in times of political crisis: a text mining approach Media coverage in times of political crisis: a text mining approach Enric Junqué de Fortuny Tom De Smedt David Martens Walter Daelemans Faculty of Applied Economics Faculty of Arts Faculty of Applied Economics

More information

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks

Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks Chuan Peng School of Computer science, Wuhan University Email: chuan.peng@asu.edu Kuai Xu, Feng Wang, Haiyan Wang

More information

Understanding factors that influence L1-visa outcomes in US

Understanding factors that influence L1-visa outcomes in US Understanding factors that influence L1-visa outcomes in US By Nihar Dalmia, Meghana Murthy and Nianthrini Vivekanandan Link to online course gallery : https://www.ischool.berkeley.edu/projects/2017/understanding-factors-influence-l1-work

More information

Final report. (revised version, 6 th December 2010) Development of national tools for the codification of occupations according to ISCO-08

Final report. (revised version, 6 th December 2010) Development of national tools for the codification of occupations according to ISCO-08 Vienna, 29 th October 2010 Final report (revised version, 6 th December 2010) Development of national tools for the codification of occupations according to ISCO-08 Grant agreement No 10202.2009.002-2009.407

More information

The Social Web: Social networks, tagging and what you can learn from them. Kristina Lerman USC Information Sciences Institute

The Social Web: Social networks, tagging and what you can learn from them. Kristina Lerman USC Information Sciences Institute The Social Web: Social networks, tagging and what you can learn from them Kristina Lerman USC Information Sciences Institute The Social Web The Social Web is a collection of technologies, practices and

More information

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships

Overview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships Neural Networks Overview Ø s are considered black-box models Ø They are complex and do not provide much insight into variable relationships Ø They have the potential to model very complicated patterns

More information

Topicality, Time, and Sentiment in Online News Comments

Topicality, Time, and Sentiment in Online News Comments Topicality, Time, and Sentiment in Online News Comments Nicholas Diakopoulos School of Communication and Information Rutgers University diakop@rutgers.edu Mor Naaman School of Communication and Information

More information

Crystal: Analyzing Predictive Opinions on the Web

Crystal: Analyzing Predictive Opinions on the Web Crystal: Analyzing Predictive Opinions on the Web Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676 Admiralty Way, Marina del Rey, CA 90292 {skim,hovy}@isi.edu Abstract In this paper,

More information

Identifying Factors in Congressional Bill Success

Identifying Factors in Congressional Bill Success Identifying Factors in Congressional Bill Success CS224w Final Report Travis Gingerich, Montana Scher, Neeral Dodhia Introduction During an era of government where Congress has been criticized repeatedly

More information

Fine-Grained Opinion Extraction with Markov Logic Networks

Fine-Grained Opinion Extraction with Markov Logic Networks Fine-Grained Opinion Extraction with Markov Logic Networks Luis Gerardo Mojica and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1 Fine-Grained Opinion Extraction

More information

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012

Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012 Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012 Abstract In this paper we attempt to develop an algorithm to generate a set of post recommendations

More information

Ushio: Analyzing News Media and Public Trends in Twitter

Ushio: Analyzing News Media and Public Trends in Twitter Ushio: Analyzing News Media and Public Trends in Twitter Fangzhou Yao, Kevin Chen-Chuan Chang and Roy H. Campbell 3rd International Workshop on Big Data and Social Networking Management and Security (BDSN

More information

REPORT DOCUMENTATION PAGE. Trend Monitoring and Forecasting. Byeong Ho Kang N/A AOARD UNIT APO AP AFRL/AFOSR/IOA(AOARD)

REPORT DOCUMENTATION PAGE. Trend Monitoring and Forecasting. Byeong Ho Kang N/A AOARD UNIT APO AP AFRL/AFOSR/IOA(AOARD) REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,

More information

Instructors: Tengyu Ma and Chris Re

Instructors: Tengyu Ma and Chris Re Instructors: Tengyu Ma and Chris Re cs229.stanford.edu Ø Probability (CS109 or STAT 116) Ø distribution, random variable, expectation, conditional probability, variance, density Ø Linear algebra (Math

More information

now called The Assam Provincialised Colleges and Assam Non-Government College Management Rules, 2001 (as amended up-to-date)

now called The Assam Provincialised Colleges and Assam Non-Government College Management Rules, 2001 (as amended up-to-date) now called The Assam Provincialised Colleges and Assam Non-Government College Management Rules, 2001 (as amended up-to-date) To read along with the following Rules/OM/Govt. Letters:- Assam Non-Government

More information

Reconviction patterns of offenders managed in the community: A 60-months follow-up analysis

Reconviction patterns of offenders managed in the community: A 60-months follow-up analysis Reconviction patterns of offenders managed in the community: A 60-months follow-up analysis Arul Nadesu Principal Strategic Adviser Policy, Strategy and Research Department of Corrections 2009 D09-85288

More information

PREDICTING COMMUNITY PREFERENCE OF COMMENTS ON THE SOCIAL WEB

PREDICTING COMMUNITY PREFERENCE OF COMMENTS ON THE SOCIAL WEB PREDICTING COMMUNITY PREFERENCE OF COMMENTS ON THE SOCIAL WEB A Thesis by CHIAO-FANG HSU Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for

More information

Procedure for the nomination and election of judges of the International Criminal Court

Procedure for the nomination and election of judges of the International Criminal Court Resolution ICC-ASP/3/Res.6 Adopted at the 6th plenary meeting, on 10 September 2004, by consensus ICC-ASP/3/Res.6 Procedure for the nomination and election of judges of the International Criminal Court

More information

Automatic Thematic Classification of the Titles of the Seimas Votes

Automatic Thematic Classification of the Titles of the Seimas Votes Automatic Thematic Classification of the Titles of the Seimas Votes Vytautas Mickevičius 1,2 Tomas Krilavičius 1,2 Vaidas Morkevičius 3 Aušra Mackutė-Varoneckienė 1 1 Vytautas Magnus University, 2 Baltic

More information

Category-level localization. Cordelia Schmid

Category-level localization. Cordelia Schmid Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object

More information

Reporting Rome Statute Offences to the ICC

Reporting Rome Statute Offences to the ICC 1 P a g e T h e R I N J F o u n d a t i o n W o r k i n g w i t h t h e I CC Reporting Rome Statute Offences to the ICC 1. Estimate which statutes the crime violates. 2. Read https://rinj.org/war-crime/

More information

Issues in Information Systems Volume 18, Issue 2, pp , 2017

Issues in Information Systems Volume 18, Issue 2, pp , 2017 IDENTIFYING TRENDING SENTIMENTS IN THE 2016 U.S. PRESIDENTIAL ELECTION: A CASE STUDY OF TWITTER ANALYTICS Sri Hari Deep Kolagani, MBA Student, California State University, Chico, skolagani@mail.csuchico.edu

More information

BOARD CHAIR Frederick P. Schaffer. BOARD MEMBERS Gregory T. Camp Richard Davis Marianne C. Spraggins Naomi B. Zauderer

BOARD CHAIR Frederick P. Schaffer. BOARD MEMBERS Gregory T. Camp Richard Davis Marianne C. Spraggins Naomi B. Zauderer BOARD CHAIR Frederick P. Schaffer BOARD MEMBERS Gregory T. Camp Richard Davis Marianne C. Spraggins Naomi B. Zauderer NEW YORK CITY CAMPAIGN FINANCE BOARD 2017 2018 VOTER ASSISTANCE ANNUAL REPORT NEW YORK

More information

Ranking Subreddits by Classifier Indistinguishability in the Reddit Corpus

Ranking Subreddits by Classifier Indistinguishability in the Reddit Corpus Ranking Subreddits by Classifier Indistinguishability in the Reddit Corpus Faisal Alquaddoomi UCLA Computer Science Dept. Los Angeles, CA, USA Email: faisal@cs.ucla.edu Deborah Estrin Cornell Tech New

More information

Users reading habits in online news portals

Users reading habits in online news portals Esiyok, C., Kille, B., Jain, B.-J., Hopfgartner, F., & Albayrak, S. Users reading habits in online news portals Conference paper Accepted manuscript (Postprint) This version is available at https://doi.org/10.14279/depositonce-7168

More information

Vote Compass Methodology

Vote Compass Methodology Vote Compass Methodology 1 Introduction Vote Compass is a civic engagement application developed by the team of social and data scientists from Vox Pop Labs. Its objective is to promote electoral literacy

More information

CS 229: r/classifier - Subreddit Text Classification

CS 229: r/classifier - Subreddit Text Classification CS 229: r/classifier - Subreddit Text Classification Andrew Giel agiel@stanford.edu Jonathan NeCamp jnecamp@stanford.edu Hussain Kader hkader@stanford.edu Abstract This paper presents techniques for text

More information

The U.S. Policy Agenda Legislation Corpus Volume 1 - a Language Resource from

The U.S. Policy Agenda Legislation Corpus Volume 1 - a Language Resource from The U.S. Policy Agenda Legislation Corpus Volume 1 - a Language Resource from 1947-1998 Stephen Purpura, John Wilkerson, Dustin Hillard Information Science, Dept. of Political Science, Dept. of Electrical

More information

CGAP Baseline Demand Side Study on Digital Remittances in Jordan: Key Qualitative Findings

CGAP Baseline Demand Side Study on Digital Remittances in Jordan: Key Qualitative Findings CGAP Baseline Demand Side Study on Digital Remittances in Jordan: Key Qualitative Findings September 16, 2016 Ipsos Public Affairs 2020 K Street, Suite 410 Washington, DC 20006 Tel: 202.463.7300 www.ipsos-na.com

More information

Analysis of Categorical Data from the California Department of Corrections

Analysis of Categorical Data from the California Department of Corrections Lab 5 Analysis of Categorical Data from the California Department of Corrections About the Data The dataset you ll examine is from a study by the California Department of Corrections (CDC) on the effectiveness

More information

Bayt.com Career Aspirations in the Middle East and North Africa. December 2014

Bayt.com Career Aspirations in the Middle East and North Africa. December 2014 Bayt.com Career Aspirations in the Middle East and North Africa December 2014 Section 1 PROJECT BACKGROUND Objective To understand the challenges and aspirations of MENA professionals. The study covers

More information

Introduction to the Virtual Issue: Recent Innovations in Text Analysis for Social Science

Introduction to the Virtual Issue: Recent Innovations in Text Analysis for Social Science Introduction to the Virtual Issue: Recent Innovations in Text Analysis for Social Science Margaret E. Roberts 1 Text Analysis for Social Science In 2008, Political Analysis published a groundbreaking special

More information

Survey Report Victoria Advocate Journalism Credibility Survey The Victoria Advocate Associated Press Managing Editors

Survey Report Victoria Advocate Journalism Credibility Survey The Victoria Advocate Associated Press Managing Editors Introduction Survey Report 2009 Victoria Advocate Journalism Credibility Survey The Victoria Advocate Associated Press Managing Editors The Donald W. Reynolds Journalism Institute Center for Advanced Social

More information

DU PhD in Home Science

DU PhD in Home Science DU PhD in Home Science Topic:- DU_J18_PHD_HS 1) Electronic journal usually have the following features: i. HTML/ PDF formats ii. Part of bibliographic databases iii. Can be accessed by payment only iv.

More information

Political Profiling using Feature Engineering and NLP

Political Profiling using Feature Engineering and NLP SMU Data Science Review Volume 1 Number 4 Article 10 2018 Political Profiling using Feature Engineering and NLP Chiranjeevi Mallavarapu Southern Methodist University, cmallavarapu@smu.edu Ramya Mandava

More information

RESULTS FRAMEWORK DOCUMENTS (RFD) ( ) QUARTERLY PROGRESS REPORT: EXPLANATORY NOTES 1 st QUARTER (APRIL TO JUNE, 2017)

RESULTS FRAMEWORK DOCUMENTS (RFD) ( ) QUARTERLY PROGRESS REPORT: EXPLANATORY NOTES 1 st QUARTER (APRIL TO JUNE, 2017) RESULTS FRAMEWORK DOCUMENTS (RFD) ( 18) QUARTERLY PROGRESS REPORT: EXPLANATORY NOTES 1 st QUARTER (APRIL TO JUNE, ) Central Sericultural Research & Training Institute Central Silk Board Ministry of Textiles;

More information

Entity Linking Enityt Linking. Laura Dietz University of Massachusetts. Use cursor keys to flip through slides.

Entity Linking Enityt Linking. Laura Dietz University of Massachusetts. Use cursor keys to flip through slides. Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu University of Massachusetts Use cursor keys to flip through slides. Problem: Entity Linking Query Entity NIL Given query mention in a source

More information

The Role of Internet Adoption on Trade within ASEAN Countries plus People s Republic of China

The Role of Internet Adoption on Trade within ASEAN Countries plus People s Republic of China The Role of Internet Adoption on Trade within ASEAN Countries plus People s Republic of China Wei Zhai Prapatchon Jariyapan Faculty of Economics, Chiang Mai University Chiang Mai University, 239 Huay Kaew

More information

SIMPLY TRADE DATA NOW

SIMPLY TRADE DATA NOW SIMPLY TRADE DATA NOW 1 REWARDS PROGRAM 10,000,000 SEC COINS 290,000 USD REPORT TO THE GOOGLE FORM https://docs.google.com/forms/d/e/1faipqlsdylbmr5e-6i2lx8ctb_t5849uapgidszckxrogqrwu7trwsq/viewform BOUNTY

More information

National Human Trafficking Resource Center (NHTRC) Data Breakdown Maine State Report 12/7/2013-9/30/2013

National Human Trafficking Resource Center (NHTRC) Data Breakdown Maine State Report 12/7/2013-9/30/2013 National Human Trafficking Resource Center (NHTRC) Data Breakdown Maine State Report 12/7/2013-9/30/2013 This report covers National Human Trafficking Resource Center (NHTRC) case and call data from December

More information

Introduction-cont Pattern classification

Introduction-cont Pattern classification How are people identified? Introduction-cont Pattern classification Biometrics CSE 190-a Lecture 2 People are identified by three basic means: Something they have (identity document or token) Something

More information

AUTOMATED CONTRACT REVIEW

AUTOMATED CONTRACT REVIEW AUTOMATED CONTRACT REVIEW Machine Learning Comes to Corporate Law Session #133 Kingsley Martin KM Standards Amy Harvey & Michael Nogroski Chapman and Cutler SPEAKERS Julian Tsisin Google AUTOMATED CONTRACT

More information

Big Data Analytics for Opinion Mining and Patterns Detection of the Tunisian Election

Big Data Analytics for Opinion Mining and Patterns Detection of the Tunisian Election Big Data Analytics for Opinion Mining and Patterns Detection of the Tunisian Election Zeineb Dhouioui Hanen Bouali Bestmod Laboratory Bestmod Laboratory ISG Tunis ISG Tunis University of Tunis University

More information

Improving the accuracy of outbound tourism statistics with mobile positioning data

Improving the accuracy of outbound tourism statistics with mobile positioning data 1 (11) Improving the accuracy of outbound tourism statistics with mobile positioning data Survey response rates are declining at an alarming rate globally. Statisticians have traditionally used imputing

More information

WORLD INTELLECTUAL PROPERTY ORGANIZATION GENEVA INTERNATIONAL PATENT COOPERATION UNION (PCT UNION) PATENT COOPERATION TREATY (PCT) WORKING GROUP

WORLD INTELLECTUAL PROPERTY ORGANIZATION GENEVA INTERNATIONAL PATENT COOPERATION UNION (PCT UNION) PATENT COOPERATION TREATY (PCT) WORKING GROUP WIPO ORIGINAL: English DATE: April 21, 2008 WORLD INTELLECTUAL PROPERTY ORGANIZATION GENEVA E INTERNATIONAL PATENT COOPERATION UNION (PCT UNION) PATENT COOPERATION TREATY (PCT) WORKING GROUP First Session

More information

CS 229 Final Project - Party Predictor: Predicting Political A liation

CS 229 Final Project - Party Predictor: Predicting Political A liation CS 229 Final Project - Party Predictor: Predicting Political A liation Brandon Ewonus bewonus@stanford.edu Bryan McCann bmccann@stanford.edu Nat Roth nroth@stanford.edu Abstract In this report we analyze

More information

Learning Expectations

Learning Expectations Learning Expectations Dear Parents, This curriculum brochure provides an overview of the essential learning students should accomplish during a specific school year. It is a snapshot of the instructional

More information

Telephone Survey. Contents *

Telephone Survey. Contents * Telephone Survey Contents * Tables... 2 Figures... 2 Introduction... 4 Survey Questionnaire... 4 Sampling Methods... 5 Study Population... 5 Sample Size... 6 Survey Procedures... 6 Data Analysis Method...

More information

Deep Learning Working Group R-CNN

Deep Learning Working Group R-CNN Deep Learning Working Group R-CNN Includes slides from : Josef Sivic, Andrew Zisserman and so many other Nicolas Gonthier February 1, 2018 Recognition Tasks Image Classification Does the image contain

More information

NLP Approaches to Fact Checking and Fake News Detection

NLP Approaches to Fact Checking and Fake News Detection NLP Approaches to Fact Checking and Fake News Detection Andreas Hanselowski, Iryna Gurevych Outline: 1. Fake News Detection 2. Automated Fact Checking 2 Outline: 1. Fake News Detection 2. Automated Fact

More information

VISA LOTTERY SERVICES REPORT FOR DV-2007 EXECUTIVE SUMMARY

VISA LOTTERY SERVICES REPORT FOR DV-2007 EXECUTIVE SUMMARY VISA LOTTERY SERVICES REPORT FOR DV-2007 EXECUTIVE SUMMARY BY J. STEPHEN WILSON CREATIVE NETWORKS WWW.MYGREENCARD.COM AUGUST, 2005 In our annual survey of immigration web sites that advertise visa lottery

More information

Gab: The Alt-Right Social Media Platform

Gab: The Alt-Right Social Media Platform Gab: The Alt-Right Social Media Platform Yuchen Zhou 1, Mark Dredze 1[0000 0002 0422 2474], David A. Broniatowski 2, William D. Adler 3 1 Center for Language and Speech Processing Johns Hopkins University,

More information

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract

Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner Abstract For our project, we analyze data from US Congress voting records, a dataset that consists

More information

A User Modeling Pipeline for Studying Polarized Political Events in Social Media

A User Modeling Pipeline for Studying Polarized Political Events in Social Media A User Modeling Pipeline for Studying Polarized Political Events in Social Media Roberto Napoli 1, Ali Mert Ertugrul 3, Alessandro Bozzon 2, Marco Brambilla 1 1 Politecnico di Milano, Italy roberto1.napoli@mail.polimi.it,

More information

A comparative analysis of subreddit recommenders for Reddit

A comparative analysis of subreddit recommenders for Reddit A comparative analysis of subreddit recommenders for Reddit Jay Baxter Massachusetts Institute of Technology jbaxter@mit.edu Abstract Reddit has become a very popular social news website, but even though

More information

Studies on translation and multilingualism

Studies on translation and multilingualism Studies on translation and multilingualism Contribution of translation to the multilingual society in the EU English summary European Commission Directorate-General for Translation 2/2010 Contribution

More information

Prediction for the Newsroom: Which Articles Will Get the Most Comments?

Prediction for the Newsroom: Which Articles Will Get the Most Comments? Prediction for the Newsroom: Which Articles Will Get the Most Comments? Carl Ambroselli 1, Julian Risch 1, Ralf Krestel 1, and Andreas Loos 2 1 Hasso-Plattner-Institut, University of Potsdam, Prof.-Dr.-Helmert-Str.

More information

Governance and Resilience

Governance and Resilience Governance and Resilience David Carment Stewart Prest Yiagadeesen Samy Draft Presentation Conference on Small States and Resilience Building Malta 2007 Previous Research Using CIFP Conflict indicators:

More information

Evidence-Based Policy Planning for the Leon County Detention Center: Population Trends and Forecasts

Evidence-Based Policy Planning for the Leon County Detention Center: Population Trends and Forecasts Evidence-Based Policy Planning for the Leon County Detention Center: Population Trends and Forecasts Prepared for the Leon County Sheriff s Office January 2018 Authors J.W. Andrew Ranson William D. Bales

More information

Statistical Analysis of Corruption Perception Index across countries

Statistical Analysis of Corruption Perception Index across countries Statistical Analysis of Corruption Perception Index across countries AMDA Project Summary Report (Under the guidance of Prof Malay Bhattacharya) Group 3 Anit Suri 1511007 Avishek Biswas 1511013 Diwakar

More information

Procedure for the nomination and election of judges, the Prosecutor and Deputy Prosecutors of the International Criminal Court (ICC-ASP/3/Res.

Procedure for the nomination and election of judges, the Prosecutor and Deputy Prosecutors of the International Criminal Court (ICC-ASP/3/Res. Procedure for the nomination and election of judges, the Prosecutor and Deputy Prosecutors of the International Criminal Court (ICC-ASP/3/Res.6) 1 - Consolidated version The Assembly of States Parties,

More information

Colloquium organized by the Council of State of the Netherlands and ACA-Europe. An exploration of Technology and the Law. The Hague 14 May 2018

Colloquium organized by the Council of State of the Netherlands and ACA-Europe. An exploration of Technology and the Law. The Hague 14 May 2018 Colloquium organized by the Council of State of the Netherlands and ACA-Europe An exploration of Technology and the Law The Hague 14 May 2018 Answers to questionnaire: Poland Colloquium co-funded by the

More information

Socially-Informed Timeline Generation for Complex Events

Socially-Informed Timeline Generation for Complex Events Socially-Informed Timeline Generation for Complex Events Lu Wang, Claire Cardie, and Galen Marchetti Department of Computer Science Cornell University Timelines [Joseph Priestley's A New Chart of History,

More information

An overview and comparison of voting methods for pattern recognition

An overview and comparison of voting methods for pattern recognition An overview and comparison of voting methods for pattern recognition Merijn van Erp NICI P.O.Box 9104, 6500 HE Nijmegen, the Netherlands M.vanErp@nici.kun.nl Louis Vuurpijl NICI P.O.Box 9104, 6500 HE Nijmegen,

More information

Data, Social Media, and Users: Can We All Get Along?

Data, Social Media, and Users: Can We All Get Along? INSIGHTi Data, Social Media, and Users: Can We All Get Along? nae redacted Analyst in Cybersecurity Policy April 4, 2018 Introduction In March 2018, media reported that voter-profiling company Cambridge

More information

Table of Contents. List of Figures 2. Executive Summary 3. 1 Introduction 4

Table of Contents. List of Figures 2. Executive Summary 3. 1 Introduction 4 Table of Contents List of Figures 2 Executive Summary 3 1 Introduction 4 2 Innovating Contributions 5 2.1 Americans 5 2.2 Australia, New Zealand and Pacific 6 2.3 Europe, Africa and Middle East 7 2.4 Japan

More information

Social Media based Analysis of Refugees in Turkey

Social Media based Analysis of Refugees in Turkey Social Media based Analysis of Refugees in Turkey Abdullah Bulbul, Cagri Kaplan, and Salah Haj Ismail Ankara Yildirim Beyazit University, Türkiye, abulbul@ybu.edu.tr http://ybu.edu.tr/abulbul Abstract.

More information

134/2016 Coll. ACT BOOK ONE GENERAL PROVISIONS

134/2016 Coll. ACT BOOK ONE GENERAL PROVISIONS 134/2016 Coll. ACT of 19 April 2016 on Public Procurement the Parliament has adopted the following Act of the Czech Republic: BOOK ONE GENERAL PROVISIONS TITLE I BASIC PROVISIONS Section 1 Scope of regulation

More information

Researching and Planning

Researching and Planning Researching and Planning Foresight issue 150 VisitBritain Research 1 Contents 1. Introduction 2. Summary 3. Roles within the planning process 4. Length of the planning process 5. Key influences for choosing

More information

The National Citizen Survey

The National Citizen Survey CITY OF SARASOTA, FLORIDA 2008 3005 30th Street 777 North Capitol Street NE, Suite 500 Boulder, CO 80301 Washington, DC 20002 ww.n-r-c.com 303-444-7863 www.icma.org 202-289-ICMA P U B L I C S A F E T Y

More information

Government Online. an international perspective ANNUAL GLOBAL REPORT. Global Report

Government Online. an international perspective ANNUAL GLOBAL REPORT. Global Report Government Online an international perspective ANNUAL GLOBAL REPORT 2002 Australia, Canada, Czech Republic, Denmark, Estonia, Faroe Islands, Finland, France, Germany, Great Britain, Hong Kong, Hungary,

More information

he World Digital Library

he World Digital Library John Van Oudenaren USA T he World Digital Library MAIN Reading Room at the Library of Congress's historic Thomas Jefferson Building, Washington, D.C. Photo by Carol M. Highsmith, between 1980 and 2006

More information