Datasets for Aspect-Based Sentiment Analysis in Bangla and Its Baseline Evaluation
|
|
- Kevin Collins
- 5 years ago
- Views:
Transcription
1 data Data Descriptor Datasets for Aspect-Based Sentiment Analysis in Bangla Its Baseline Evaluation Md. Atikur Rahman * Emon Kumar Dey * Institute Information Technology, University Dhaka, Dhaka 1000, Bangladesh * Correspondence: bsse0521@iit.du.ac.bd (M.A.R.); emonkd@iit.du.ac.bd (E.K.D.) Received: 20 March 2018; Accepted: 2 May 2018; Published: 4 May 2018 Abstract: With extensive growth user interactions through prominent advances Web, sentiment analysis has obtained more focus from an academic a commercial point view. Recently, sentiment analysis in Bangla language is progressively being considered as an important task, for which previous approaches have attempted to detect overall polarity a Bangla document. To best our knowledge, re is no research on aspect-based sentiment analysis (ABSA) Bangla text. This can be described as being due to lack available datasets for ABSA. In this paper, provide two publicly available datasets to perform ABSA task in Bangla. One datasets consists human-annotated user comments on cricket, or dataset consists customer reviews restaurants. We also describe a baseline approach for subtask aspect category extraction to evaluate our datasets. Dataset: Dataset License: CC0 Keywords: ABSA dataset; Bangla ABSA; aspect extraction from Bangla 1. Summary People trust human opinion more so than traditional advertising. For example, consumers are used to seeking advice recommendation from ors before making decisions regarding important purchases. Word mouth (WOM) has always been salient for consumers when making a decision. Such referrals have a strong impact on both customer decision-making new customer acquisition for purchasing a company s product or service [1]. On or h, organizations are eager to mine all activities interactions people to underst what ir aknesses strengths are. This understing would help m to develop ir organizational strategy in this competitive world. Sentiment analysis (or opinion mining) is a process to determine viewpoint a person on a certain topic. It classifies polarity a document (i.e., review, tet, blog, or news), that is, wher communicated opinion is positive, negative, or neutral. There are three levels at which sentiment is analyzed [2]: document level, sentence level, aspect level. The document level considers that a document has an opinion on an entity, task is to classify wher an entire document expresses a positive or negative sentiment. The task at sentence level regards sentences determining wher each sentence expresses a positive, negative, or neutral opinion. Neir document level nor sentence level analysis discover exactly what people liked did not like. The aspect level (or aspect-based sentiment analysis ABSA) performs a finer-grained analysis that identifies aspects a given document or sentence sentiment expressed towards Data 2018, 3, 15; doi: /data
2 Data 2018, 3, each aspect. This level analysis is most detailed version that is capable discovering complex opinions from reviews. There are two major tasks when performing ABSA. The first is to extract specific areas or aspects mentioned in opinioned review. The second is to identify polarity (eir positive, negative, or neutral) for every aspect. For example, following review a restaurant reveals two aspects: service food. Both aspects have a positive polarity. The service was excellent food was delicious. As one can see, name aspect categories are explicitly mentioned in this review. A review might also contain implicit categories; for example, The staff makes you feel at home chicken is great. Here, same aspects, service food, are contained without being directly mentioned. Semantic Evaluation (SemEval), a reputed workshop in NLP domain, introduced a complete dataset [3] in English for ABSA task. Later this was exped to ABSA task by adding multi-lingual datasets in which eight languages over seven domains re incorporated. To perform ABSA, datasets several languages, such as Arabic [4], Czech [5], French [6], re created. There is no dataset for Bangla in field ABSA. Consequently, no work is being done to extract aspects to identify corresponding polarities for Bangla reviews. We are currently working on a project to extract aspects from a Bangla review or comments for a particular product a company, as online shopping is very popular nowadays in Bangladesh is growing rapidly. People like to buy products online after reading comments ors. In this paper, have created two new datasets that serve as a benchmark for ABSA domain in Bangla texts. We present two datasets named Restaurant. The first dataset contains 2900 comments on cricket over 5 aspect categories, second dataset contains 2600 restaurant reviews. Because re is no work in Bangla for ABSA task, have introduced ABSA by extracting aspect categories from Bangla texts in order to evaluate our datasets. We performed task with different training approaches found a satisfactory outcome compared to evaluations or languages. There are some related works from which founded idea this topic. The restaurant review dataset, provided by Ganu et al. [7], was used to improve rating predictions. Their annotations included six aspect categories overall sentence polarities. They had not prepared a complete ABSA dataset, as aspect category was present but corresponding polarity that aspect was absent. The SemEval 2014 evaluation campaign [3] extended ir dataset by adding three more fields with aspect category. They published ir dataset with four fields being contained for each review, that is, with aspect term occurring in sentences, aspect term s polarity, aspect category, aspect category s polarity. They also provided a laptop-review dataset manually annotated with similar entities as for restaurant dataset. These are benchmark datasets that [8 11] researches have used for performing ABSA task. The task was repeated in SemEval 2015 [12], for which aspect categories re combination entity type an attribute type. Multilingual datasets re released in SemEval 2016 workshop [13] on seven domains (restaurant, laptop, mobile phone, digital camera, hotel, museum) in eight languages (English, Arabic, French, Chinese, Turkish, Spanish, Dutch, Russian). A book-review dataset in Arabic language was provided by [4]. They annotated book reviews into 14 categories 4 types polarities, including Conflict. In [5], author created an IT product-review dataset for ABSA task, in which a total 2200 reviews re contained. The contribution this paper is as follows: We have collected presented two Bangla datasets for ABSA have made m publicly available. We performed statistical linguistic analysis on datasets.
3 Data 2018, 3, We implemented state---art machine learning approaches for collected datasets Data 2018, 3, x 3 11 found satisfactory accuracies Data Description Digital Bangladesh is is integral integral part part Bangladesh Bangladesh government s government s Vision Vision The Internet The isinternet growingis very growing fast over very fast country, over country, people are using people different are using online different platforms online inplatforms every aspect in every iraspect lives. This ir encouraged lives. This usencouraged to constructus datasets to construct to analyze datasets Bengali to analyze people s Bengali opinions people s to extract opinions ir sentiments to extract in ir different sentiments aspects. in different In this paper, aspects. In have this constructed paper, have two different constructed datasets, two namely, different datasets, dataset namely, Restaurant dataset dataset, to evaluate Restaurant dataset, people s to opinions. evaluate people s opinions Dataset 2.1. Dataset The dataset consists 2900 different comments from different online sources with five different The aspect dataset categories. consists 2900 Mostdifferent comments from are different collectedonline fromsources Facebook with pages five ( different aspect categories. Most comments are collected from Facebook pages Some ( comments are collected from two popular Bengali Websites, BBC Bangla ( bengali), Some comments Daily are Prothom collected Alo ( from two popular Bengali This dataset Websites, was collected BBC Bangla by authors ( this paper. The comments are Daily different Prothom lengths Alo ( each review contains approximately This dataset Bangla was collected words. The by reasons authors behind this choosing paper. se The comments Websites for are collecting different data lengths are given below: each review contains approximately Bangla words. The reasons behind choosing se Websites for collecting BBC Bangla data are given below: Daily Prothom Alo are very popular online news sites for Bengali community BBC Bangla all around Daily world. Prothom They Alo are are popular very popular for publishing online news trustworthy sites for auntic Bengali news. community Bengali all people around frequently world. read They are news popular sometimes for publishing make trustworthy comments to share auntic ir opinion. news. Bengali Although people people frequently write ir read comments news or opinions sometimes in both make Bangla comments English, to share most ir opinion. time, y Although choose people Bangla. write Weir studied comments different or opinions articles in both found Bangla that in almost English, 90% most cases, time, people y expressed choose Bangla. ir opinion We studied in Bangla. different articles found that in almost 90% The cases, Facebook people expressed page Prothom ir opinion Alo has in Bangla. over 13 million follors, BBC Bangla has over 11 The million. Facebook These page two pages Prothom provide Alo enormous has over 13 text million postsfollors, as ll as a large BBC number Bangla has comments. over 11 million. isthese one two pages mostprovide popularenormous games nowadays text posts for as Bengali ll as a people. large number We found comments. that people are moreis interested one in most making popular comments games nowadays on cricket-related for Bengali news people. than We on found any that or people topic. Thus, are more chose interested this category in making for comments our experiment. on cricket-related news than on any or topic. Thus, chose this category for our experiment. Table 1 shows an example comments collected from Facebook pages. Table 1 shows an example comments collected from Facebook pages. Table 1. Example cricket-related comments on Prothom Alo BBC Bangla Facebook pages. Table 1. Example cricket-related comments on Prothom Alo BBC Bangla Facebook pages. Comments ম শর ফ এক জ দ ক র ন ম য ন মট ন লই মন ভ র য য় আম দর জ হর,জনসন, প লক, ট ল ন ই ত ব এক জন ম শর ফ আ ছ ব ল র দরও দ ষ নই, দ ষট 100% ম নজ ম র পস ব ল র দর জন উই কট ত র ত ত র ই ক র ন দ ষট ম নজ ম ন ম ন ল আ ম নব আশ ক র ত স কন অ ত ত দ ল ফর ব আর নয় মত খল ব এব চ র আউট কর ব এখন দশ ক দর ক ছ জন য় হ ট /ওয় ন ড ম ন ষ এখন দখ তই চ য়ন.. হ র লও ব ল দশ জত লও ব ল দশ আগ ম ত আব র আমর ই জত ব ইশ! আম দর য দ বর ট ক হ লর ম ত একট ব ট ন থ ক ত আম র পর মশ হ ল কট র দর ক প রট জগত থ ক দ র র খ ত হ ব Source Prothom Alo Facebook page Prothom Alo Facebook page Prothom Alo Facebook page Prothom Alo Facebook page BBC Bangla Facebook page BBC Bangla Facebook page People People usually usually comment comment in in Bangla Bangla about about news. news. We We also also found found that that 5 10% 5 10% time, time, y y commented in English also wrote Bangla sentences written in English alphabet. We did not commented in English also wrote Bangla sentences written in English alphabet. We did not consider se opinions in our dataset. In addition, some comments had only emoticons no or consider se opinions in our dataset. In addition, some comments had only emoticons no text or words. We also omitted se for our dataset. All processes re done manually by or text or words. We also omitted se for our dataset. All processes re done manually authors. The following section describes annotation process collected corpus cricketrelated comments. by authors. The following section describes annotation process collected corpus cricket-related comments Annotation Dataset The Bangla text on cricket was annotated jointly by authors, a group second-year students BSSE, two employees from Institute Information Technology, University Dhaka,
4 Data 2018, 3, Annotation Dataset The Bangla text on cricket was annotated jointly by authors, a group second-year students BSSE, two employees from Institute Information Technology, University Dhaka, Bangladesh. All participants agreed to categorize whole dataset into five different aspect categories. These Data Data 2018, 2018, re3, 3, bowling, x batting, team, team management, or. Given a comment, task 12 annotators was to recommend aspect category polarity labels for each. Three types Bangladesh. polarities re All considered, participants thatagreed agreed is, positive, to to categorize categorize negative, whole whole neutral. dataset dataset Tableinto into 2 shows five five different different information aspect aspect about categories. These re bowling, batting, team, team, team team management, or. or. Given Given a comment, a comment, participants. task annotators was to to recommend aspect aspect category polarity polarity labels labels for each. for each. Three Three types types polarities re considered, that that is, is, positive, negative, neutral. neutral. Table Table 2 shows 2 shows information information Table 2. Information about participants in data collection. about participants. Participant ID Gender Pression Task Table Information about about participants in data in data collection. collection. P1 Male MS student/author Data collection () annotation P2 Participant ID ID Male Gender Pression Faculty/author Pression Task Data collection Task () annotation P1 P3 P1 Male Male Male MS MS student/author Graduate student/author Data Annotation Data collection collection () () annotation translation annotation (Restaurant) P2 P4 P2 Male Female Male Faculty/author Graduate Faculty/author Data student Annotation Data collection collection () () annotation translation annotation (Restaurant) P3 Male Graduate student Annotation () translation (Restaurant) P5 P3 Female Male Graduate student Annotation () () translation translation (Restaurant) (Restaurant) P4 Female Graduate student Annotation () translation (Restaurant) P6 P4 Male Female Graduate student Annotation Annotation () () translation translation (Restaurant) (Restaurant) P5 Female Graduate student Annotation () translation (Restaurant) P7 P5 Male Female Graduate student Annotation Annotation () () translation translation (Restaurant) (Restaurant) P6 Male Graduate student Annotation () translation (Restaurant) P8 P6 Male Male Graduate student Annotation Annotation () () translation translation (Restaurant) (Restaurant) P7 Male Graduate student Annotation () translation (Restaurant) P9 Female Accountant Annotation P8 P7 Male Male Graduate Graduate student student Annotation Annotation () () translation translation (Restaurant) (Restaurant) P10 Male Officer Annotation P9 P8 Female Male Graduate Accountant student Annotation () Annotation translation (Restaurant) P10 P9 Male Female Accountant Officer Annotation Annotation P10 Male Officer Annotation Each participant categorized every comment dataset. We applied majority voting technique Each to participant make final categorized decision every about comment aspect category dataset. We applied polarity majority a sentence. voting As an technique Each to participant make final categorized decision every about comment aspect category dataset. We polarity applied a sentence. majority As an voting example, have taken following comment: technique example, to have make taken final following decision comment: about aspect category polarity a sentence. As an example, have taken এই following প চ র ন কর comment: ট ফ, ব ল ন স হ ভ ল হ য় ছ The The voting voting result result found এই প চ for for র ন comment comment কর ট ফ, is ব ল is given given ন স হ in Table in Table 3. ভ ল 3. হ য় ছ The voting result found for comment is given in Table 3. Table 3. Voting example to define category polarity. Table 3. Voting example to define category polarity. Comment: Table 3. এই Voting প চ example র ন কর ট ফ, to define ব ল ন স হ category ভ ল হ য় ছ polarity. Participant Voting for Category Voting for Polarity Comment: এই প চ র ন কর ট ফ, ব ল ন স হ ভ ল হ য় ছ P1 Bowling Positive Participant Voting for Category Voting for Polarity Participant P2 Voting Bowling for Category Positive Voting for Polarity P1 Bowling Positive P1 P3 Batting P2 Bowling Bowling Positive Positive P2 P4 Batting P3 Batting Bowling Positive P5 Or Neutral P3 P6 P4 Bowling Batting Batting Positive P4 P7 P5 Or Or Batting Neutral Neutral P5 P8 P6 Bowling Bowling Or Positive Positive Neutral P6 Bowling Positive P9 P7 Bowling Or Positive Neutral P7 Or Neutral P10 P8 Batting Bowling Positive P8 Bowling Positive P9 Bowling Positive P9 Bowling Positive From Table 3, can see P10 that comments Batting had three votes for P10 Batting Batting with a negative polarity, two votes for Or with a neutral polarity, four votes for Bowling with a positive polarity. Thus, our method From Table determined 3, can this see comment that as comments being in had Bowling three votes category for Batting with a positive with a negative polarity. polarity, We two also From votes had Table ties for for Or 3, some with can comments. a see neutral thatin polarity, this comments situation, four hadtook votes three both for votes Bowling categories for Batting with with a positive with ir a negative polarity. in polarity, Thus, twour votes dataset. method fortable determined Or 4 shows withthis an neutral example comment polarity, for as this being scenario. in four Bowling votes category for Bowling with with a positive a positive polarity. polarity. We Thus, also our had method ties for determined some comments. this comment In this situation, as being intook Bowling both categories category with air positive polarity polarity. in Weour alsodataset. had ties Table for some 4 shows comments. an example In for this this situation, scenario. took both categories with ir polarity in our dataset. Table 4 shows an example for this scenario.
5 Data 2018, 3, Data 2018, 3, x 5 11 Table 4. Voting category identification. Data 2018, 3, x Table 4. Voting category identification Comment: Table ওর Voting ক র ছ, category ত মর identification. 100 কর ত প র ব ন? Participant Voting for Voting for Polarity Comment: ওর Voting 200 ক র ছ, for Category ত মর 100 কর ত Voting প র ব ন? for Polarity Participant P1 P1 Voting Batting for Batting Category Voting for Polarity P2 P2 P1 Team Batting Team P3 P3 P2 Team Batting P4 P4 P3 Batting Batting P5 Batting P5 P4 Batting P6 Team P6 P5 Team Batting P7 Team P8 P7 P6 Team Batting P9 P8 P7 Batting Team Team P10 P8 Batting Team P9 Team P9 Team P10 Team P10 Team We can see from Table 4 that 50% evaluators voted for Batting, 50% votes re for We Team can We see category, can from see from Table both Table with 4 that 4 that a negative 50% 50% polarity. evaluators As y voted had for for tied, Batting, Batting, our algorithm 50% 50% took votes both votes re se re categories for for Team inteam category, labeled category, both dataset both with with a a anegative polarity. As y had had tied, tied, our our algorithm took took both both se We se also categories faced in anor in labeled kind dataset problem with a to negative construct polarity. dataset. After calculating category, found We also We dissimilarities also faced faced anor anor forkind some kind comments problem to among construct participants dataset. After After regarding calculating polarity category, category, comment. found found For example, dissimilarities when for for some some considered comments among following participants comment, regarding found voting polarity polarity result given comment. comment. For For example, example, when when considered considered following comment, comment, found found voting voting result in Table 5. result given in Table 5. given in Table 5. Table 5. Problem related to polarity determination. Table 5. Problem related to polarity determination. Table 5. Problem related to polarity determination. Comment: র ক ব ড় হ য় গ ছ, খল প রন ত হ ল আজ কভ ব ক কর ল? Comment: Participant র ক ব ড় হ য় Voting গ ছ, for খল Category প রন ত হ ল Voting আজ কভ ব for Polarity ক কর ল? Participant Voting for Category Voting for Polarity Participant P1 Voting Bowling for Category Voting Positive for Polarity P1 P2 P1 Bowling Bowling Positive Positive P2 Bowling Positive P2 P3 Bowling Team P3 Team Positive P3 P4 Bowling Team P4 Bowling P5 P4 P5 Or Bowling Or Positive Positive P6 P5 P6 Team Or Team Positive P7 P7 Or Or P6 P8 Team P8 Team Team P7 P9 P9 Or Bowling Positive P8 P10 P10 Team Team P9 Bowling Positive From Table P10 5, can see that, for Team comment, both Bowling Team categories had four From Table 5, can see that, for comment, both Bowling Team categories had four votes. Thus, keep both se categories in our annotated dataset. In polarity, re re three votes. Thus, keep both se categories in our annotated dataset. In polarity, re re three From positive Table votes 5, one can negative see that, vote for for bowling. comment, Thus, both considered Bowling it as having Team a positive categories polarity positive had four votes added it to one our negative dataset. Table vote6 for shows bowling. a sample Thus, labeled considered it dataset. as having a positive polarity votes. added Thus, it to our keep dataset. both Table se 6 categories shows a sample in our annotated dataset. In dataset. polarity, re re three positive In Table votes 7, one summary negative vote complete for bowling. Thus, dataset considered is presented. it as We having can see a positive that re polarity are a total added 3034 it different to our dataset. comments Table with 6 shows five different a sample categories, labeled that is, Batting, dataset. Bowling, Team, Team Management, Or. Table Each6. A part categories contains dataset threein different xlsx format. polarities: positive, negative, neutral. For example, Batting category contains a total 583 comments, for which 138 are positive, 389 are বয প র negative, ন eট ধ ম t 56 ঘর টন areছ ড় neutral কছ ন polarity. The Bowling category or contains 154neutral positive, 145 negative, 33 neutral ভক মন polarity ট iগ র দর comments. জনয The Team, Team Management, team Or categories positive contain totals ব ল দশ 774, 332, eখ ন ত ম মর 1013 য গয comments, o পন র প ল respectively. ন batting negative ব ল দশ হ র ব আজ team negative স তজন sন র ন য় ম ঠ ব তল ট ন ন য য় ময চ জত য য় ন bowling negative ট নর পচ ব ন য় ঔষধ খ জ ল ভ ক ব u n পচ ব ন লi হয় team management negative eট প র দ ম ফ k eকট খল হi স team negative জয় ধ সম য়র a পk or positive যi জ য়র সম ন!! or neutral
6 Data 2018, 3, Data 2018, 3, x 6 12 Table 6. A part dataset in xlsx format. বয প র ন eট ধ ম t ঘর টন ছ ড় কছ ন or neutral ভক মন ট iগ র দর জনয team positive ব ল দশ eখ ন ত ম মর য গয o পন র প ল ন batting negative ব ল দশ হ র ব আজ team negative স তজন sন র ন য় ম ঠ ব তল ট ন ন য য় ময চ জত য য় ন bowling negative ট নর পচ ব ন য় ঔষধ খ জ ল ভ ক ব u n পচ ব ন লi হয় team management negative eট প র দ ম ফ k eকট খল হi স team negative জয় ধ সম য়র a পk or positive যi জ য়র সম ন!! or neutral ব ল রর য প রম ন শটর বল দ c- ত ত র ন কত ব শ হয় সট i দখ র বষয়! bowling negative ত ক টs আর o ডআi দ ল নয় মত চ i team positive ব ল দশ k কট আ র e গ য় য ব, o পন র দর eকট ভ ল কর ত হ ব team positive ব ল দশ k কট আ র e গ য় য ব, o পন র দর eকট ভ ল কর ত হ ব batting negative ফ রi চমক দখ লন র j ক bowling positive নব ন দর স য গ দয় দরক র. or neutral ব ল পচ ত ব আম দর বয টসময ন দর আuট ল আtহতয ছ ড় আর কছ i নয় batting negative ব ল পচ ত ব আম দর বয টসময ন দর আuট ল আtহতয ছ ড় আর কছ i নয় bowling neutral দ য়tj ন হ নত র aভ ব? or negative In Table 7, summary Table 7. The complete complete statistics dataset is presented. dataset. We can see that re are a total 3034 different comments with five different categories, that is, Batting, Bowling, Team, Team Management, Or. Each categories contains Polarity three different polarities: positive, negative, neutral. For Category example, Batting category contains a total 583 comments, for Total which 138 are Positive Neutral positive, 389 are negative, 56 are neutral polarity. The Bowling category contains 154 Batting positive, 145 negative, 33 neutral polarity comments. The Team, Team Management, Or Bowling categories contain Team totals 774, 332, comments, 502 respectively Team Management Or Table 7. The 89complete statistics 828 dataset Total Comments Polarity 3034 Category Total Positive Neutral Analysis Proposed Batting Dataset Bowling We used Zipf s law [14] for our proposed dataset. Zipf s law is a statement-based Team observation that states that Team inmanagement a collection data, 24 frequency a given332 word should be inversely proportional to its rank in Or corpus. The word89 that is 828 most frequent scores rank 1 in a dataset should occur approximately twice as Total second Comments most frequent word, three 3034 times as third most frequent, so on. Figure 1 shows diagram in which plotted words our dataset. The plot follows Analysis trend Proposed Zipf s law. Dataset We also calculated reliability annotation process. The valuewe used intraclass Zipf s law correlation [14] for our (ICC) proposed was dataset. Zipf s law is a statement-based observation that states that in a collection data, frequency a given word should be inversely 2.2. Restaurant proportional Dataset to its rank in corpus. The word that is most frequent scores rank 1 in a dataset To should create occur approximately Bangla Restaurant twice as dataset, second most took frequent help directly word, three from times English as third benchmark s most frequent, so on. Figure 1 shows diagram in which plotted words our dataset. Restaurant dataset [3]. All comments re abstractly translated into Bangla with ir exact annotation. The plot follows trend Zipf s law. We also calculated reliability annotation process. The original English dataset contains a total 2800 different comments. Participants from same The value intraclass correlation (ICC) was group involved in dataset s creation re involved in translation process Restaurant dataset, except participants P9 P10. We divided original dataset equally into eight parts distributed se to participants. They translated ir assigned parts original English dataset abstractly. Finally, participants P1 P2 merged separate sections performed an extensive proread.
7 Data 2018, 3, Figure Distribution word frequencies dataset using Zipf s law. law. Annotation Schema for Restaurant The Restaurant reviews [3] dataset used in this paper was abstractly translated into Bangla. There re five types aspect categories, that is, Food, Price, Service, Ambiance, Miscellaneous. As objective was to identify aspect category corresponding polarity, participants did not add aspect terms or ir polarities. In terms polarity an aspect category, considered only three polarity labels, that is, positive, negative neutral. The original dataset consisted four different polarity labels: positive, negative, neutral, conflict. In our translated Bangla dataset, omitted conflict category assumed it to be same as neutral category. The annotators re asked to assign each translated Bangla restaurant review into ir categories ir polarities for original dataset. Table 8 shows a sample translated Restaurant dataset. Data 2018, 3, x 8 12 Table Table A part part Restaurant dataset in inxlsx xlsxformat. খ ব স মত আসন আ ছ eব খ দয প oয় র জনয য থ a পk কর ত হ ব ambience negative খ ব স মত আসন আ ছ eব খ দয প oয় র জনয য থ a পk কর ত হ ব service negative Figure 2. Word দ ম frequency ত লন ম লকভ ব Bangla কম Restaurant dataset according to Zipf s law. price positive 3 i ছল মজ দ র food positive য দo খ ব র ট চমৎক র ছল, e ট সs ছল ন food positive য দo খ ব র ট চমৎক র ছল, e ট সs ছল ন price negative খ ব ভ ল! miscellaneous positive আচ রর স য জন খ ব ভ ল ছল food positive ধ ম t র n i য সর ত নয়, সব সবসময় ম ন য গ eব ভ ল হ য় ছ food positive ধ ম t র n i য সর ত নয়, সব সবসময় ম ন য গ eব ভ ল হ য় ছ service positive সবর দ eক ট স nর ভড়, কn ক ন ক ল হল নi ambience positive সj alsl eব প র র - ব n ব pশ স কর কছ i নi ambience neutral আ ম ন ত য আম ক ব রবর ফ র য ত হ ব,!!! miscellaneous positive সm বত e ট eক ট ছ ট আর মদ য়ক রs রn,ভ ল সj র স র ম nক aন ভ ত ambience positive য দo খ দয ভ ল ছল প র বষন ছল ব food positive য দo খ দয ভ ল ছল প র বষন ছল ব service negative Data 2018, 3, x; do i: কম র ম ন য গ eব বn tপ ণর service om/journal/data positive খ ব র ভ ল ছল food positive Table 9 shows complete statistics Bangla Restaurant dataset. We can see from table Table that 9five shows different complete categories, statistics that is, Food, Price, Bangla Service, Restaurant Ambiance, dataset. Miscellaneous, We can see contained from table that five713, different 178, 336, categories, 234, 613 that reviews, is, Food, respectively, Price, Service, with three Ambiance, different polarities. Miscellaneous, For example, contained 713, 178, 336, category 234, Food 613contained reviews, 500 respectively, positive, 126 with negative, three different 87 neutral polarities. sentiment For labels. example, The Service category Food contained category contained 500 positive, 186 positive, 126 negative, 118 negative, neutral sentiments. labels. We also The found Service that this category Restaurant dataset also follod Zipf s law, which is shown in Figure 2. Table 9. Complete statistics Bangla Restaurant dataset. Polarity Category Total Positive Neutral Food
8 Data 2018, 3, contained 186 positive, 118 negative, 32 neutral sentiments. We also found that this Restaurant dataset also follod Zipf s law, which is shown in Figure 2. Category Table 9. Complete statistics Bangla Restaurant dataset. Polarity Positive Neutral Total Food Price Service Ambiance Figure 1. Distribution word frequencies dataset using Zipf s law. Miscellaneous Baseline Evaluation Figure Word frequency Bangla Restaurant dataset accordingto to Zipf s law. law. Our objective is to provide benchmark datasets for Bangla ABSA. Our datasets are designed for two major tasks ABSA. These are aspect category extraction identification polarity for each aspect category. In this paper, experimented with first subtask, that is, extraction aspect category. We applied three major steps to extract aspect category. Firstly, preprocessing was performed on dataset. After this, extracted features from data finally performed classification using some popular classification models Preprocessing Feature Extraction In preprocessing phase, each Bangla document was represented as a bag words. We applied traditional preprocessing steps for evaluation. Firstly, punctuations stop words Data 2018, 3, x; do i: om/journal/data re removed from each comments. After this, removed digits from our dataset, because found that digits re not necessary for aspect category. Finally, tokenized each Bangla word from our dataset. Thus, a vocabulary Bangla words was prepared after preprocessing. We created a feature matrix for which each review was represented by a vector that vocabulary. Term frequency inverse document frequency (TF IDF) was used for calculating features Results In training phase, extracted feature sets re trained by popular supervised machine learning algorithms. Because this was a multi-label classification problem, trained our models
9 Data 2018, 3, by setting up multi-label output. We used linear SVC in support vector machine (SVM) implementation. The following machine learning algorithms re used: I. Support Vector Machine (SVM) II. III. Rom forest (RF) K-nearest neighbor (KNN) After training was completed, our proposed Bangla test dataset was executed on trained model. The result is shown in following table figure. Table 10 shows results for task aspect category extraction datasets have presented in this paper. We can see that using SVM, obtained highest precision rate for both datasets. Both datasets shod a low recall F1-score. Figure 3 shows overall accuracy models using our datasets. The inherent nature datasets is reason behind lor performance models for both datasets. People share ir opinion with ir individual judgment. Therefore, variety opinions in datasets is much larger. On or h, aspect extraction is a multi-label classification problem. One s opinion might have multiple aspect categories. Conventional classifiers miss some se aspect categories. Table 10. Performance proposed datasets. Dataset Model Precision Recall F1-Score Restaurant SVM RF KNN SVM RF KNN Data 2018, 3, x Accuracy SVM RF KNN SVM RF KNN Restaurant Precision Recall F1-score Figure 3. The result three models our datasets. Figure 3. The result three models our datasets. 4. Conclusions Future Work Two datasets are provided for ABSA Bangla text. These datasets have been designed to perform two tasks covering aspect category extraction identification polarity for that aspect category. We also report baseline results to evaluate task aspect category extraction. As future plans, aim to enhance our work by including furr domains such as cars, mobiles, laptops. We are working on more advanced methods for ABSA Bangla text using our datasets to achieve better performance. These results can be improved if process train datasets in a more sophisticated way. In this work, have taken all vocabulary as features for evaluation after removing punctuation, stop words, digits. Some state---art techniques for information gain can be applied to dataset before classification after preprocessing steps to attain better results. 4. Conclusions Future Work Author Contributions: All authors contributed equally to this work, have read approved final manuscript. Two datasets are provided for ABSA Bangla text. These datasets have been designed to Conflicts Interest: The authors declare no conflict interest. perform two tasks covering aspect category extraction identification polarity for that aspect References category. We also report baseline results to evaluate task aspect category extraction. 1. Trusov, M.; Bucklin, R.E.; Pauls, K. Effects word--mouth versus traditional marketing: Findings from an internet social networking site. J. Mark. 2009, 73, Jeyapriya, A.; Selvi, C.K. Extracting Aspects Mining Opinions in Product Reviews Using Supervised Learning Algorithm. In Proceedings nd International Conference on Electronics Communication Systems (ICECS), Coimbatore, India, February Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manhar, S. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. Available Online:
10 Data 2018, 3, As future plans, aim to enhance our work by including furr domains such as cars, mobiles, laptops. We are working on more advanced methods for ABSA Bangla text using our datasets to achieve better performance. Author Contributions: final manuscript. All authors contributed equally to this work, have read approved Conflicts Interest: The authors declare no conflict interest. References 1. Trusov, M.; Bucklin, R.E.; Pauls, K. Effects word--mouth versus traditional marketing: Findings from an internet social networking site. J. Mark. 2009, 73, [CrossRef] 2. Jeyapriya, A.; Selvi, C.K. Extracting Aspects Mining Opinions in Product Reviews Using Supervised Learning Algorithm. In Proceedings nd International Conference on Electronics Communication Systems (ICECS), Coimbatore, India, February Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manhar, S. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. Available online: (accessed on 3 May 2018). 4. Al-Smadi, M.; Qawasmeh, O.; Talafha, B.; Quwaider, M. Human Annotated Arabic Dataset Book Reviews for Aspect Based Sentiment Analysis. In Proceedings rd International Conference on Future Internet Things Cloud (FiCloud), Rome, Italy, August Tamchyna, A.; Fiala, O.; Veselovská, K. Czech Aspect-Based Sentiment Analysis: A New Dataset Preliminary Results. Available online: (accessed on 3 May 2018). 6. Apidianaki, M.; Tannier, X.; Richart, C. Datasets for Aspect-Based Sentiment Analysis in French. Available online: (accessed on 3 May 2018). 7. Gayatree, G.; Elhadad, N.; Marian, A. Beyond Stars: Improving Rating Predictions Using Review Text Content. Available online: type=pdf (accessed on 3 May 2018). 8. Kiritchenko, S.; Zhu, X.; Cherry, C.; Mohammad, S. NRC-Canada-2014: Detecting Aspects Sentiment in Customer Reviews. Available online: (accessed on 3 May 2018). 9. Kiritchenko, S.; Zhu, X.; Cherry, C.; Mohammad, S. Supervised Unsupervised Aspect Category Detection for Sentiment Analysis with Co-cccurrence Data. In IEEE Transactions on Cybernetics; IEEE: Piscataway, NJ, USA, Soujanya, P.; Cambria, E.; Gelbukh, A. Aspect extraction for opinion mining with a deep convolutional neural network. Knowl.-Based Syst. 2016, 108, Pengfei, L.; Joty, S.; Meng, H. Fine-Grained Opinion Mining with Recurrent Neural Networks Word Embeddings. Available online: (accessed on 3 May 2018). 12. Pontiki, M.; Galanis, D.; Papageorgiou, H.; Manhar, S.; Androutsopoulos, I. Semeval-2015 Task 12: Aspect Based Sentiment Analysis. Available online: (accessed on 3 May 2018). 13. Pontiki, M.; Galanis, D.; Papageorgiou, H.; Androutsopoulos, I.; Manhar, S.; AL-Smadi, M.; Al-Ayyoub, M.; Zhao, Y.; Qin, B.; De Clercq, O.; et al. SemEval-2016 Task 5: Aspect Based Sentiment Analysis. Available online: (accessed on 3 May 2018). 14. Pak, A.; Paroubek, P. Twitter as A Corpus for Sentiment Analysis Opinion Mining. Available online: (accessed on 3 May 2018) by authors. Licensee MDPI, Basel, Switzerl. This article is an open access article distributed under terms conditions Creative Commons Attribution (CC BY) license (
Thursday, October 22, 2015, Kartik 7, 1422 BS, Muharram 8, 1437 Hijr
Thursday, October 22, 2015, Kartik 7, 1422 BS, Muharram 8, 1437 Hijr 35 Bangladeshis at El Paso detention centre in US end hunger strike Observer Online Desk Published :Thursday, 22 October, 2015, Time
More informationDR. MAIDUL ISLAM BA (Calcutta), MA (JNU), MPhil (JNU), DPhil (Oxon) University Education
October 2018 DR. MAIDUL ISLAM BA (Calcutta), MA (JNU), MPhil (JNU), DPhil (Oxon) Present Position: Assistant Professor of Political Science, Centre for Studies in Social Sciences, Calcutta. Institutional
More information(SUSANTA GHOSH) Circle Secretary. Ref: AIBDPA/1/Mtg/01 Date:
Ref: AIBDPA/1/Mtg/01 Date: 06.12.12 A meeting of the Circle office bearers will be held on 13 th December 2012 at CTO Union office at 2pm to discuss the subjects as contained in the following agenda. All
More informationIntroduction to the Transfer of Property Laws of Bangladesh
Chapter-One Introduction to the Transfer of Property Laws of Bangladesh The word property is the outcome of human civilization. In the early stage of human civilization man had no idea of property or properties
More information투표시유의사항. Voting on Election Day 在选举日当天 ন র ব চন র দ ন ভ ট প রদ ন
Voting on Election Day 1. If this is your first time voting: most voters provide a Social Security number or driver s license when registering to vote. If you did not, you need to bring proof of identification
More informationPolitical Science Syllabus for Three Year Degree Course (Semester Pattern) (Honours and General) w.e.f Honours
Political Science Syllabus for Three Year Degree Course (Semester Pattern) (Honours and General) w.e.f 2014-2015 Honours First Semester (July to December) Paper-PLSH-101 (50 marks) Western Political Thought.
More informationFinal Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I)
University of Calcutta Final Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I) Core Courses [Fourteen courses; Each course: 6 credits (5 theoretical segment+ 1 for tutorial-related
More informationPublic Disclosure Authorized. Public Disclosure Authorized. Public Disclosure Authorized. Public Disclosure Authorized
Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized Public Disclosure Authorized Ministry of Planning Air Vice Marshal (Retd.) A K Khandker Minister Government of the
More informationUniversity of Calcutta. Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I)
University of Calcutta Draft BA (Honours)-CBCS Syllabus in Political Science, 2018 (Section I) A. Core Courses [Fourteen courses; Each course: 6 credits (5 theoretical segment+ 1 for tutorial-related segment).
More informationBangladesh Women and Children Repression Prevention Act of 2000
Bangladesh Women and Children Repression Prevention Act of 2000 ও ঠ আই ও ঠ ও ; আই ই :- ১ ও ঠ আই ও ঠ ও ; ২ আই ই :- ২ ছ আই,- ( ) ই আই ; (খ) ই ঝ ই ই ই ; ( ) আট ই আট ই খ ; (ঘ) ই ই আই ই ; (ঙ) ৯, Penal Code,
More informationTHE ASSAM GAZETTE অস ধ ৰণ
পঞ জ ভ ক ত নম বৰ- ৭৬৮ ৯৭ Registered No.768/97 অসম ৰ জপত র THE ASSAM GAZETTE অস ধ ৰণ EXTRAORDINARY প র প ত কর ত ত ত বৰ দ ব ৰ প রক শ ত PUBLISHED BY AUTHORITY ন 118 দ শপ ৰ শদনব ৰ 16 জ ন 2001 26 জজঠ 1923 (শক)
More informationTracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene
Tracking Sentiment Evolution on User-Generated Content: A Case Study on the Brazilian Political Scene Diego Tumitan, Karin Becker Instituto de Informatica - Universidade Federal do Rio Grande do Sul, Brazil
More informationPerformance Evaluation of Cluster Based Techniques for Zoning of Crime Info
Performance Evaluation of Cluster Based Techniques for Zoning of Crime Info Ms. Ashwini Gharde 1, Mrs. Ashwini Yerlekar 2 1 M.Tech Student, RGCER, Nagpur Maharshtra, India 2 Asst. Prof, Department of Computer
More informationMining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining
Mining Expert Comments on the Application of ILO Conventions on Freedom of Association and Collective Bargaining G. Ritschard (U. Geneva), D.A. Zighed (U. Lyon 2), L. Baccaro (IILS & MIT), I. Georgiu (IILS
More informationarxiv: v2 [cs.si] 10 Apr 2017
Detection and Analysis of 2016 US Presidential Election Related Rumors on Twitter Zhiwei Jin 1,2, Juan Cao 1,2, Han Guo 1,2, Yongdong Zhang 1,2, Yu Wang 3 and Jiebo Luo 3 arxiv:1701.06250v2 [cs.si] 10
More informationTowards Tackling Hate Online Automatically
Towards Tackling Hate Online Automatically Nikola Ljubešić 1, Darja Fišer 2,1, Tomaž Erjavec 1 1 Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana 2 Department of Translation, University
More informationClinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump
Clinton vs. Trump 2016: Analyzing and Visualizing Tweets and Sentiments of Hillary Clinton and Donald Trump ABSTRACT Siddharth Grover, Oklahoma State University, Stillwater The United States 2016 presidential
More informationSubjectivity Classification
Subjectivity Classification Wilson, Wiebe and Hoffmann: Recognizing contextual polarity in phrase-level sentiment analysis Wiltrud Kessler Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
More informationResearch and strategy for the land community.
Research and strategy for the land community. To: Northeastern Minnesotans for Wilderness From: Sonia Wang, Spencer Phillips Date: 2/27/2018 Subject: Full results from the review of comments on the proposed
More informationNatural Language Technologies for E-Rulemaking. Claire Cardie Department of Computer Science Cornell University
Natural Language Technologies for E-Rulemaking Claire Cardie Department of Computer Science Cornell University An E-Rulemaking Scenario Summarize the public commentary regarding the prohibition of potassium
More informationAn Integrated Tag Recommendation Algorithm Towards Weibo User Profiling
An Integrated Tag Recommendation Algorithm Towards Weibo User Profiling Deqing Yang, Yanghua Xiao, Hanghang Tong, Junjun Zhang and Wei Wang School of Computer Science Shanghai Key Laboratory of Data Science
More informationPopularity Prediction of Reddit Texts
San Jose State University SJSU ScholarWorks Master's Theses Master's Theses and Graduate Research Spring 2016 Popularity Prediction of Reddit Texts Tracy Rohlin San Jose State University Follow this and
More informationAutomated Classification of Congressional Legislation
Automated Classification of Congressional Legislation Stephen Purpura John F. Kennedy School of Government Harvard University +-67-34-2027 stephen_purpura@ksg07.harvard.edu Dustin Hillard Electrical Engineering
More informationExperiments on Data Preprocessing of Persian Blog Networks
Experiments on Data Preprocessing of Persian Blog Networks Zeinab Borhani-Fard School of Computer Engineering University of Qom Qom, Iran Behrouz Minaie-Bidgoli School of Computer Engineering Iran University
More informationTHE GOP DEBATES BEGIN (and other late summer 2015 findings on the presidential election conversation) September 29, 2015
THE GOP DEBATES BEGIN (and other late summer 2015 findings on the presidential election conversation) September 29, 2015 INTRODUCTION A PEORIA Project Report Associate Professors Michael Cornfield and
More informationA Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media
Proceedings of IOE Graduate Conference, 2017 Volume: 5 ISSN: 2350-8914 (Online), 2350-8906 (Print) A Qualitative and Quantitative Analysis of the Political Discourse on Nepalese Social Media Mandar Sharma
More informationIntroduction to Text Modeling
Introduction to Text Modeling Carl Edward Rasmussen November 11th, 2016 Carl Edward Rasmussen Introduction to Text Modeling November 11th, 2016 1 / 7 Key concepts modeling document collections probabilistic
More informationThe IWSLT 2015 Evaluation Campaign
The IWSLT 2015 Evaluation Campaign Mauro Cettolo, FBK-irst, Italy Jan Niehues, KIT, Germany Sebastian Stüker, KIT, Germany Luisa Bentivogli, FBK, Italy Roldano Cattoni, FBK, Italy Marcello Federico, FBK-irst,
More informationMedia coverage in times of political crisis: a text mining approach
Media coverage in times of political crisis: a text mining approach Enric Junqué de Fortuny Tom De Smedt David Martens Walter Daelemans Faculty of Applied Economics Faculty of Arts Faculty of Applied Economics
More informationPredicting Information Diffusion Initiated from Multiple Sources in Online Social Networks
Predicting Information Diffusion Initiated from Multiple Sources in Online Social Networks Chuan Peng School of Computer science, Wuhan University Email: chuan.peng@asu.edu Kuai Xu, Feng Wang, Haiyan Wang
More informationUnderstanding factors that influence L1-visa outcomes in US
Understanding factors that influence L1-visa outcomes in US By Nihar Dalmia, Meghana Murthy and Nianthrini Vivekanandan Link to online course gallery : https://www.ischool.berkeley.edu/projects/2017/understanding-factors-influence-l1-work
More informationFinal report. (revised version, 6 th December 2010) Development of national tools for the codification of occupations according to ISCO-08
Vienna, 29 th October 2010 Final report (revised version, 6 th December 2010) Development of national tools for the codification of occupations according to ISCO-08 Grant agreement No 10202.2009.002-2009.407
More informationThe Social Web: Social networks, tagging and what you can learn from them. Kristina Lerman USC Information Sciences Institute
The Social Web: Social networks, tagging and what you can learn from them Kristina Lerman USC Information Sciences Institute The Social Web The Social Web is a collection of technologies, practices and
More informationOverview. Ø Neural Networks are considered black-box models Ø They are complex and do not provide much insight into variable relationships
Neural Networks Overview Ø s are considered black-box models Ø They are complex and do not provide much insight into variable relationships Ø They have the potential to model very complicated patterns
More informationTopicality, Time, and Sentiment in Online News Comments
Topicality, Time, and Sentiment in Online News Comments Nicholas Diakopoulos School of Communication and Information Rutgers University diakop@rutgers.edu Mor Naaman School of Communication and Information
More informationCrystal: Analyzing Predictive Opinions on the Web
Crystal: Analyzing Predictive Opinions on the Web Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676 Admiralty Way, Marina del Rey, CA 90292 {skim,hovy}@isi.edu Abstract In this paper,
More informationIdentifying Factors in Congressional Bill Success
Identifying Factors in Congressional Bill Success CS224w Final Report Travis Gingerich, Montana Scher, Neeral Dodhia Introduction During an era of government where Congress has been criticized repeatedly
More informationFine-Grained Opinion Extraction with Markov Logic Networks
Fine-Grained Opinion Extraction with Markov Logic Networks Luis Gerardo Mojica and Vincent Ng Human Language Technology Research Institute University of Texas at Dallas 1 Fine-Grained Opinion Extraction
More informationRecommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012
Recommendations For Reddit Users Avideh Taalimanesh and Mohammad Aleagha Stanford University, December 2012 Abstract In this paper we attempt to develop an algorithm to generate a set of post recommendations
More informationUshio: Analyzing News Media and Public Trends in Twitter
Ushio: Analyzing News Media and Public Trends in Twitter Fangzhou Yao, Kevin Chen-Chuan Chang and Roy H. Campbell 3rd International Workshop on Big Data and Social Networking Management and Security (BDSN
More informationREPORT DOCUMENTATION PAGE. Trend Monitoring and Forecasting. Byeong Ho Kang N/A AOARD UNIT APO AP AFRL/AFOSR/IOA(AOARD)
REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationInstructors: Tengyu Ma and Chris Re
Instructors: Tengyu Ma and Chris Re cs229.stanford.edu Ø Probability (CS109 or STAT 116) Ø distribution, random variable, expectation, conditional probability, variance, density Ø Linear algebra (Math
More informationnow called The Assam Provincialised Colleges and Assam Non-Government College Management Rules, 2001 (as amended up-to-date)
now called The Assam Provincialised Colleges and Assam Non-Government College Management Rules, 2001 (as amended up-to-date) To read along with the following Rules/OM/Govt. Letters:- Assam Non-Government
More informationReconviction patterns of offenders managed in the community: A 60-months follow-up analysis
Reconviction patterns of offenders managed in the community: A 60-months follow-up analysis Arul Nadesu Principal Strategic Adviser Policy, Strategy and Research Department of Corrections 2009 D09-85288
More informationPREDICTING COMMUNITY PREFERENCE OF COMMENTS ON THE SOCIAL WEB
PREDICTING COMMUNITY PREFERENCE OF COMMENTS ON THE SOCIAL WEB A Thesis by CHIAO-FANG HSU Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for
More informationProcedure for the nomination and election of judges of the International Criminal Court
Resolution ICC-ASP/3/Res.6 Adopted at the 6th plenary meeting, on 10 September 2004, by consensus ICC-ASP/3/Res.6 Procedure for the nomination and election of judges of the International Criminal Court
More informationAutomatic Thematic Classification of the Titles of the Seimas Votes
Automatic Thematic Classification of the Titles of the Seimas Votes Vytautas Mickevičius 1,2 Tomas Krilavičius 1,2 Vaidas Morkevičius 3 Aušra Mackutė-Varoneckienė 1 1 Vytautas Magnus University, 2 Baltic
More informationCategory-level localization. Cordelia Schmid
Category-level localization Cordelia Schmid Recognition Classification Object present/absent in an image Often presence of a significant amount of background clutter Localization / Detection Localize object
More informationReporting Rome Statute Offences to the ICC
1 P a g e T h e R I N J F o u n d a t i o n W o r k i n g w i t h t h e I CC Reporting Rome Statute Offences to the ICC 1. Estimate which statutes the crime violates. 2. Read https://rinj.org/war-crime/
More informationIssues in Information Systems Volume 18, Issue 2, pp , 2017
IDENTIFYING TRENDING SENTIMENTS IN THE 2016 U.S. PRESIDENTIAL ELECTION: A CASE STUDY OF TWITTER ANALYTICS Sri Hari Deep Kolagani, MBA Student, California State University, Chico, skolagani@mail.csuchico.edu
More informationBOARD CHAIR Frederick P. Schaffer. BOARD MEMBERS Gregory T. Camp Richard Davis Marianne C. Spraggins Naomi B. Zauderer
BOARD CHAIR Frederick P. Schaffer BOARD MEMBERS Gregory T. Camp Richard Davis Marianne C. Spraggins Naomi B. Zauderer NEW YORK CITY CAMPAIGN FINANCE BOARD 2017 2018 VOTER ASSISTANCE ANNUAL REPORT NEW YORK
More informationRanking Subreddits by Classifier Indistinguishability in the Reddit Corpus
Ranking Subreddits by Classifier Indistinguishability in the Reddit Corpus Faisal Alquaddoomi UCLA Computer Science Dept. Los Angeles, CA, USA Email: faisal@cs.ucla.edu Deborah Estrin Cornell Tech New
More informationUsers reading habits in online news portals
Esiyok, C., Kille, B., Jain, B.-J., Hopfgartner, F., & Albayrak, S. Users reading habits in online news portals Conference paper Accepted manuscript (Postprint) This version is available at https://doi.org/10.14279/depositonce-7168
More informationVote Compass Methodology
Vote Compass Methodology 1 Introduction Vote Compass is a civic engagement application developed by the team of social and data scientists from Vox Pop Labs. Its objective is to promote electoral literacy
More informationCS 229: r/classifier - Subreddit Text Classification
CS 229: r/classifier - Subreddit Text Classification Andrew Giel agiel@stanford.edu Jonathan NeCamp jnecamp@stanford.edu Hussain Kader hkader@stanford.edu Abstract This paper presents techniques for text
More informationThe U.S. Policy Agenda Legislation Corpus Volume 1 - a Language Resource from
The U.S. Policy Agenda Legislation Corpus Volume 1 - a Language Resource from 1947-1998 Stephen Purpura, John Wilkerson, Dustin Hillard Information Science, Dept. of Political Science, Dept. of Electrical
More informationCGAP Baseline Demand Side Study on Digital Remittances in Jordan: Key Qualitative Findings
CGAP Baseline Demand Side Study on Digital Remittances in Jordan: Key Qualitative Findings September 16, 2016 Ipsos Public Affairs 2020 K Street, Suite 410 Washington, DC 20006 Tel: 202.463.7300 www.ipsos-na.com
More informationAnalysis of Categorical Data from the California Department of Corrections
Lab 5 Analysis of Categorical Data from the California Department of Corrections About the Data The dataset you ll examine is from a study by the California Department of Corrections (CDC) on the effectiveness
More informationBayt.com Career Aspirations in the Middle East and North Africa. December 2014
Bayt.com Career Aspirations in the Middle East and North Africa December 2014 Section 1 PROJECT BACKGROUND Objective To understand the challenges and aspirations of MENA professionals. The study covers
More informationIntroduction to the Virtual Issue: Recent Innovations in Text Analysis for Social Science
Introduction to the Virtual Issue: Recent Innovations in Text Analysis for Social Science Margaret E. Roberts 1 Text Analysis for Social Science In 2008, Political Analysis published a groundbreaking special
More informationSurvey Report Victoria Advocate Journalism Credibility Survey The Victoria Advocate Associated Press Managing Editors
Introduction Survey Report 2009 Victoria Advocate Journalism Credibility Survey The Victoria Advocate Associated Press Managing Editors The Donald W. Reynolds Journalism Institute Center for Advanced Social
More informationDU PhD in Home Science
DU PhD in Home Science Topic:- DU_J18_PHD_HS 1) Electronic journal usually have the following features: i. HTML/ PDF formats ii. Part of bibliographic databases iii. Can be accessed by payment only iv.
More informationPolitical Profiling using Feature Engineering and NLP
SMU Data Science Review Volume 1 Number 4 Article 10 2018 Political Profiling using Feature Engineering and NLP Chiranjeevi Mallavarapu Southern Methodist University, cmallavarapu@smu.edu Ramya Mandava
More informationRESULTS FRAMEWORK DOCUMENTS (RFD) ( ) QUARTERLY PROGRESS REPORT: EXPLANATORY NOTES 1 st QUARTER (APRIL TO JUNE, 2017)
RESULTS FRAMEWORK DOCUMENTS (RFD) ( 18) QUARTERLY PROGRESS REPORT: EXPLANATORY NOTES 1 st QUARTER (APRIL TO JUNE, ) Central Sericultural Research & Training Institute Central Silk Board Ministry of Textiles;
More informationEntity Linking Enityt Linking. Laura Dietz University of Massachusetts. Use cursor keys to flip through slides.
Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu University of Massachusetts Use cursor keys to flip through slides. Problem: Entity Linking Query Entity NIL Given query mention in a source
More informationThe Role of Internet Adoption on Trade within ASEAN Countries plus People s Republic of China
The Role of Internet Adoption on Trade within ASEAN Countries plus People s Republic of China Wei Zhai Prapatchon Jariyapan Faculty of Economics, Chiang Mai University Chiang Mai University, 239 Huay Kaew
More informationSIMPLY TRADE DATA NOW
SIMPLY TRADE DATA NOW 1 REWARDS PROGRAM 10,000,000 SEC COINS 290,000 USD REPORT TO THE GOOGLE FORM https://docs.google.com/forms/d/e/1faipqlsdylbmr5e-6i2lx8ctb_t5849uapgidszckxrogqrwu7trwsq/viewform BOUNTY
More informationNational Human Trafficking Resource Center (NHTRC) Data Breakdown Maine State Report 12/7/2013-9/30/2013
National Human Trafficking Resource Center (NHTRC) Data Breakdown Maine State Report 12/7/2013-9/30/2013 This report covers National Human Trafficking Resource Center (NHTRC) case and call data from December
More informationIntroduction-cont Pattern classification
How are people identified? Introduction-cont Pattern classification Biometrics CSE 190-a Lecture 2 People are identified by three basic means: Something they have (identity document or token) Something
More informationAUTOMATED CONTRACT REVIEW
AUTOMATED CONTRACT REVIEW Machine Learning Comes to Corporate Law Session #133 Kingsley Martin KM Standards Amy Harvey & Michael Nogroski Chapman and Cutler SPEAKERS Julian Tsisin Google AUTOMATED CONTRACT
More informationBig Data Analytics for Opinion Mining and Patterns Detection of the Tunisian Election
Big Data Analytics for Opinion Mining and Patterns Detection of the Tunisian Election Zeineb Dhouioui Hanen Bouali Bestmod Laboratory Bestmod Laboratory ISG Tunis ISG Tunis University of Tunis University
More informationImproving the accuracy of outbound tourism statistics with mobile positioning data
1 (11) Improving the accuracy of outbound tourism statistics with mobile positioning data Survey response rates are declining at an alarming rate globally. Statisticians have traditionally used imputing
More informationWORLD INTELLECTUAL PROPERTY ORGANIZATION GENEVA INTERNATIONAL PATENT COOPERATION UNION (PCT UNION) PATENT COOPERATION TREATY (PCT) WORKING GROUP
WIPO ORIGINAL: English DATE: April 21, 2008 WORLD INTELLECTUAL PROPERTY ORGANIZATION GENEVA E INTERNATIONAL PATENT COOPERATION UNION (PCT UNION) PATENT COOPERATION TREATY (PCT) WORKING GROUP First Session
More informationCS 229 Final Project - Party Predictor: Predicting Political A liation
CS 229 Final Project - Party Predictor: Predicting Political A liation Brandon Ewonus bewonus@stanford.edu Bryan McCann bmccann@stanford.edu Nat Roth nroth@stanford.edu Abstract In this report we analyze
More informationLearning Expectations
Learning Expectations Dear Parents, This curriculum brochure provides an overview of the essential learning students should accomplish during a specific school year. It is a snapshot of the instructional
More informationTelephone Survey. Contents *
Telephone Survey Contents * Tables... 2 Figures... 2 Introduction... 4 Survey Questionnaire... 4 Sampling Methods... 5 Study Population... 5 Sample Size... 6 Survey Procedures... 6 Data Analysis Method...
More informationDeep Learning Working Group R-CNN
Deep Learning Working Group R-CNN Includes slides from : Josef Sivic, Andrew Zisserman and so many other Nicolas Gonthier February 1, 2018 Recognition Tasks Image Classification Does the image contain
More informationNLP Approaches to Fact Checking and Fake News Detection
NLP Approaches to Fact Checking and Fake News Detection Andreas Hanselowski, Iryna Gurevych Outline: 1. Fake News Detection 2. Automated Fact Checking 2 Outline: 1. Fake News Detection 2. Automated Fact
More informationVISA LOTTERY SERVICES REPORT FOR DV-2007 EXECUTIVE SUMMARY
VISA LOTTERY SERVICES REPORT FOR DV-2007 EXECUTIVE SUMMARY BY J. STEPHEN WILSON CREATIVE NETWORKS WWW.MYGREENCARD.COM AUGUST, 2005 In our annual survey of immigration web sites that advertise visa lottery
More informationGab: The Alt-Right Social Media Platform
Gab: The Alt-Right Social Media Platform Yuchen Zhou 1, Mark Dredze 1[0000 0002 0422 2474], David A. Broniatowski 2, William D. Adler 3 1 Center for Language and Speech Processing Johns Hopkins University,
More informationLearning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner. Abstract
Learning and Visualizing Political Issues from Voting Records Erik Goldman, Evan Cox, Mikhail Kerzhner Abstract For our project, we analyze data from US Congress voting records, a dataset that consists
More informationA User Modeling Pipeline for Studying Polarized Political Events in Social Media
A User Modeling Pipeline for Studying Polarized Political Events in Social Media Roberto Napoli 1, Ali Mert Ertugrul 3, Alessandro Bozzon 2, Marco Brambilla 1 1 Politecnico di Milano, Italy roberto1.napoli@mail.polimi.it,
More informationA comparative analysis of subreddit recommenders for Reddit
A comparative analysis of subreddit recommenders for Reddit Jay Baxter Massachusetts Institute of Technology jbaxter@mit.edu Abstract Reddit has become a very popular social news website, but even though
More informationStudies on translation and multilingualism
Studies on translation and multilingualism Contribution of translation to the multilingual society in the EU English summary European Commission Directorate-General for Translation 2/2010 Contribution
More informationPrediction for the Newsroom: Which Articles Will Get the Most Comments?
Prediction for the Newsroom: Which Articles Will Get the Most Comments? Carl Ambroselli 1, Julian Risch 1, Ralf Krestel 1, and Andreas Loos 2 1 Hasso-Plattner-Institut, University of Potsdam, Prof.-Dr.-Helmert-Str.
More informationGovernance and Resilience
Governance and Resilience David Carment Stewart Prest Yiagadeesen Samy Draft Presentation Conference on Small States and Resilience Building Malta 2007 Previous Research Using CIFP Conflict indicators:
More informationEvidence-Based Policy Planning for the Leon County Detention Center: Population Trends and Forecasts
Evidence-Based Policy Planning for the Leon County Detention Center: Population Trends and Forecasts Prepared for the Leon County Sheriff s Office January 2018 Authors J.W. Andrew Ranson William D. Bales
More informationStatistical Analysis of Corruption Perception Index across countries
Statistical Analysis of Corruption Perception Index across countries AMDA Project Summary Report (Under the guidance of Prof Malay Bhattacharya) Group 3 Anit Suri 1511007 Avishek Biswas 1511013 Diwakar
More informationProcedure for the nomination and election of judges, the Prosecutor and Deputy Prosecutors of the International Criminal Court (ICC-ASP/3/Res.
Procedure for the nomination and election of judges, the Prosecutor and Deputy Prosecutors of the International Criminal Court (ICC-ASP/3/Res.6) 1 - Consolidated version The Assembly of States Parties,
More informationColloquium organized by the Council of State of the Netherlands and ACA-Europe. An exploration of Technology and the Law. The Hague 14 May 2018
Colloquium organized by the Council of State of the Netherlands and ACA-Europe An exploration of Technology and the Law The Hague 14 May 2018 Answers to questionnaire: Poland Colloquium co-funded by the
More informationSocially-Informed Timeline Generation for Complex Events
Socially-Informed Timeline Generation for Complex Events Lu Wang, Claire Cardie, and Galen Marchetti Department of Computer Science Cornell University Timelines [Joseph Priestley's A New Chart of History,
More informationAn overview and comparison of voting methods for pattern recognition
An overview and comparison of voting methods for pattern recognition Merijn van Erp NICI P.O.Box 9104, 6500 HE Nijmegen, the Netherlands M.vanErp@nici.kun.nl Louis Vuurpijl NICI P.O.Box 9104, 6500 HE Nijmegen,
More informationData, Social Media, and Users: Can We All Get Along?
INSIGHTi Data, Social Media, and Users: Can We All Get Along? nae redacted Analyst in Cybersecurity Policy April 4, 2018 Introduction In March 2018, media reported that voter-profiling company Cambridge
More informationTable of Contents. List of Figures 2. Executive Summary 3. 1 Introduction 4
Table of Contents List of Figures 2 Executive Summary 3 1 Introduction 4 2 Innovating Contributions 5 2.1 Americans 5 2.2 Australia, New Zealand and Pacific 6 2.3 Europe, Africa and Middle East 7 2.4 Japan
More informationSocial Media based Analysis of Refugees in Turkey
Social Media based Analysis of Refugees in Turkey Abdullah Bulbul, Cagri Kaplan, and Salah Haj Ismail Ankara Yildirim Beyazit University, Türkiye, abulbul@ybu.edu.tr http://ybu.edu.tr/abulbul Abstract.
More information134/2016 Coll. ACT BOOK ONE GENERAL PROVISIONS
134/2016 Coll. ACT of 19 April 2016 on Public Procurement the Parliament has adopted the following Act of the Czech Republic: BOOK ONE GENERAL PROVISIONS TITLE I BASIC PROVISIONS Section 1 Scope of regulation
More informationResearching and Planning
Researching and Planning Foresight issue 150 VisitBritain Research 1 Contents 1. Introduction 2. Summary 3. Roles within the planning process 4. Length of the planning process 5. Key influences for choosing
More informationThe National Citizen Survey
CITY OF SARASOTA, FLORIDA 2008 3005 30th Street 777 North Capitol Street NE, Suite 500 Boulder, CO 80301 Washington, DC 20002 ww.n-r-c.com 303-444-7863 www.icma.org 202-289-ICMA P U B L I C S A F E T Y
More informationGovernment Online. an international perspective ANNUAL GLOBAL REPORT. Global Report
Government Online an international perspective ANNUAL GLOBAL REPORT 2002 Australia, Canada, Czech Republic, Denmark, Estonia, Faroe Islands, Finland, France, Germany, Great Britain, Hong Kong, Hungary,
More informationhe World Digital Library
John Van Oudenaren USA T he World Digital Library MAIN Reading Room at the Library of Congress's historic Thomas Jefferson Building, Washington, D.C. Photo by Carol M. Highsmith, between 1980 and 2006
More information