Melanoma is a serious type of skin cancer. It starts in skin cells called melanocytes. There are 3 main types of skin cancer, Melanoma, Basal and Squamous cell carcinoma. Melanoma is more likely to spread to other parts of the body. Early detection o
Full description
Implementing the Elevator using VHDL or Verilog based design IP cores.Full description
Implementing the Elevator using VHDL or Verilog based design IP cores.Description complète
I have tried to explain thoroughly the functioning and programming of the PWM mode of the Output Compare Module of the dsPIC33FJ12GP202. I have tried to discuss in detail the generation of t…Descripción completa
In this we test and implement adders.it takes less time to detect faults in this adders.....it clearly shows of where fault occursFull description
jkjkFull description
NOTE: it usually takes some time to reply to your queries because i have a lot of work and i'm not on scribd frequently so please be patient. thanks The project deals with VHDL implementat…Full description
Falls have become common nowadays among the elderly. It has been noted by the World Health Organization WHO that approximately one out of 3 elderly people aged above 65 living alone tend to fall and the rate can increase in the coming years. Many ide
Pursuing your bright careeer
crimes and other wrongsFull description
IFAS of nintendo
Full description
VHDL is an industry standard language for the description, modelling and synthesis of digital circuits and systems. It arose out of the US government’s Very High Speed Integrated Circuits (V…Full description
Vedic Multiplier Paper . NO codes :(Full description
paper mplsDescripción completa
Deskripsi lengkap
Resolution, Speed, and Power consumption are the three key parameters for an Analog-to-Digital Converter (ADC) and Flash ADCs are the fastest type of ADC. In this paper an effort is made …Full description
carrier aggregation doc
In theoretical computer science, combinatorial optimization problems are about finding an optimal item from a finite set of objects. Combinatorial optimization is the process of searching for maxima or minima of an unbiased function whose domain is a
C program for list implementation using array
Full description
Parallel computing is a type of computation in which many processing are performed concurrently often by dividing large problems into smaller ones that execute independently of each other. There are several different types of parallel computing. The
COMP38120: Documents, Services and Data on the Web
Laboratory Exercise 1.3
Author: Cristiano Ruschel Marques Dias
Description The indexing algorithm implemented using the map-reduce architecture allows whomever has access to the output data to make queries taking on account the position of each word on the document, and the amount of occurrences of each word in each document. The features implemented were: ● Case Folding - A context based capitalization algorithm was used to decide when to leave a word capitalized. Essentially, whenever a word is the start of a sentence, it is presupposed that it normally would not be capitalized, and therefore is lower-cased. A lot of thought was given into this, specially if it would be worth to implement a casing algorithm, given that in general not using case-matching gives good enough - and arguably as good as. Since the usage of this does not have a great impact on performance, and according to the opinion viewed on [1], the algorithm uses it, but results are similar without it. ● Punctuation treating - instead of removing all punctuation, we instead trim the punctuation of the words, sinte punctuation in the middle of a word sometimes have meaning. For example, the number 8.5 or in-mapper . To do this we also had to separetely trim the tags for references generated by wikipedia, due to their peculiar form - [ number ] -, so mechanisms to modify the importance of words inside a reference in the document would be easily implemented from this point. ● Stop-Words and Stemming - After the aforementioned steps, stop words - words that do not add information to the text - are removed using the algorithm provided. After this, words are stemmed, also using a provided algorithm. ● In-mapper combining - The map-reduction pattern called “in mapper combining” was implemented. This means that instead of being directly written into the context to be treated by the reducer, for each mapper the key-value pairs are preprocessed, such as to lessen the amount of information that is sent to the reducers and increase overall speed of the run of the map-reduce. It was implemented in a way that the pre combined key-value pairs are written in the context as soon as the mapper finishes or the Map containing them has used too much memory - a constant value can be specified. It is similar to the last implementation found on [2] - though no code was copied.
● Positional indexing - The position of each of each occurrence of each token emitted by the mapper - which is a simplified version of a word resulted by the aforementioned operations - is kept, and propagated in the output, so that queries can take the position of the word in the document into account. ● Flagging of important items - The modifications needed to implement the propagations of the flagging of important items to the output were not made; therefore, even though the verification of this is being made in some points, this information is not sent to the output.
Performance All the operations made have a run-time complexity of O(n) in relation to the length of the input, which guarantees the speed and scalability of the algorithm implemented. The algorithm takes some time to run due to the overheads involved in the map-reduce architecture, though as the input grows, the overhead gets comparatively insignificant. The use of the in-mapper combining pattern helps us avoid bottlenecks, such as the algorithm running slowly due to the excessive - normally costly - memory operations that would be caused by the mapper sending an unnecessary large amount of data to the reducers, which makes the algorithm better scalable. Also the pattern does not overload the memory, and the usage of memory used by the in-map combiner may be changed enables the algorithm to be selectively tuned for different users or situations. The bottlenecks to the algorithm as implemented are the amount of memory on the machine - though it would need a really big input to cause real impact on the performance - and the amount of cores, since those limit the amount of map-reduce operations that can be run parallelly.
/** * Basic Inverted Index * * This Map Reduce program should build an Inverted Index from a set of files. * Each token (the key) in a given file should reference the file it was found * in. * * The output of the program should look like this: * sometoken [file001, file002, ... ] * * @author Kristian Epps */ package uk.ac.man.cs.comp38120.exercise; import java.io.*; import java.util.*; import java.util.regex.Pattern; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileSplit; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.commons.cli.CommandLine; import org.apache.commons.cli.CommandLineParser; import org.apache.commons.cli.HelpFormatter; import org.apache.commons.cli.OptionBuilder;
import org.apache.commons.cli.Options; import org.apache.commons.cli.ParseException; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.log4j.Logger; import uk.ac.man.cs.comp38120.io.array.ArrayListWritable; import uk.ac.man.cs.comp38120.io.pair.PairOfStringFloat; import uk.ac.man.cs.comp38120.io.pair.PairOfWritables; import uk.ac.man.cs.comp38120.util.XParser; import uk.ac.man.cs.comp38120.ir.StopAnalyser; import uk.ac.man.cs.comp38120.ir.Stemmer; import static java.lang.System.out; public class BasicInvertedIndex extends Configured implements Tool { private static final Logger LOG = Logger .getLogger(BasicInvertedIndex.class); public static class Map extends Mapper