Optical Character Recognition A Major Qualifying Project Report submitted to the faculty of GURU NANAK DEV INSTITUTE OF O F TECHNOLOGY in partial fulfillment of the requirements for the Diploma in Computer Sciences by
!ushagra Chadha Amit !umar
April "#$ "#%&
Professor Muneesh Meena$ Major Ad'isor
Abstract Our project project aimed aimed to understan understand$ d$ de'elo de'elop p and impro'e impro'e the open
Optica Opticall Charac Character ter
Recogni(er )OCR* soft+are$ OCR$ to better handle some of the more comple, recognition issues such as unique language alphabets and special characters such as mathematical symbols- .e de'eloped OCR to +or/ +ith any language by creating support for 01234 character encoding1he 1he 'ari 'ariou ouss stage stagess of an OCR OCR syst system em are5 are5 uplo upload ad a scann scanned ed image image from from the the comp comput uter er$$ segmentation process in +hich +e e,tract the te,t (one from the image$ recognition of the te,t and the last +hich is post processing process in +hich the output of the pre'ious stage goes through through the error detection detection and correction correction phase- 1his report report e,plains about the user interface interface pro'ided +ith the OCR +ith the help of +hich a user can 'ery easily add or modify the segmentation done by the OCR system-
Table of Contents
Chapter 1: Background....................................................................................................................5 1.1 Introduction................................................................................................................................5 1.2 History of OCR..........................................................................................................................5 1.2.1 Tempate!"at Tempate!"atching ching "ethod...................................................................... ............................. .# 1.2.2 $eephoe "ethod....................................................................................................................% 1.2.& 'tructured (naysis "ethod........................................................................................... .......1) 1.2.* +actors +actors infuencing infuencing OCR soft,are performance................................................ performance................................................ ................ ........ ..........12 ..12 1.& Independent Component (naysis...........................................................................................15 1.* -nergy!ased -nergy!ased "odes for sparse o/ercompete representations..................................... representations..................................... .........22 ......... 22 1.5 +inite 'tate Transducers Transducers in 0anguage and 'peech 'peech $rocessing.................................. .............. ........ ......2* 2* 1.5.1 'euentia Transducers.........................................................................................................25 1.5.2 eighted eighted +inite 'tate Transducers.................................................................................... ....25 1.5.& Transducers Transducers in 0anguage "odeing...................................................................... "odeing...................................................................... ................23 ............... .23 1.# Image +ie +ormats..................................................................................................................24 1.#.1 TI++......................................................................................................................................24 1.#.2 $+.......................................................................................................................................24 1.#.& $67.......................................................................................................................................2% 1.#.* 8$-7.....................................................................................................................................2% Chapter 2: 'I$ and $y9T...............................................................................................................&) Introduction..................................................................................................................................&) 2.1 0icense.....................................................................................................................................&) 2.2 +eatures....................................................................................................................................&) 2.& 'I$ Components.......................................................................................................................&1 2.* $reparing for 'I$ /5.................................................................................................................&2 2.5 9t 'upport................................................................................................................................&2 2.# Instaation................................................................................................................................&2 2.#.1 o,noading.........................................................................................................................&2 2.#.2 Configuring...........................................................................................................................&2 2.#.& Buiding................................................................................................................................. 2.#.* Configuring Configuring ,ith Configuration +ies................................................................................ ... 2.3 ;sing 'I$.................................................................................................................................&3 2.3.1 ( 'impe C<< -=ampe........................................................................................................&3 2.3.2 ( "ore Compe= C<< -=ampe...........................................................................................&% 2.3.& O,nership of O>ects...........................................................................................................*& 2.3.* Types and "eta!types...........................................................................................................** 2.3.5 0a?y Type Type (ttriutes............................................................................................ (ttriutes............................................................................................ .................** ............. ....** 2.4 'upport for $ython@s $ython@s Buffer Interface......................................................................... .............*5 2.% 'upport for ide ide Characters........................................................................................... .........*5 2.1) The $ython 7oa Interpreter 0ock......................................................................................*5 2.11 Buiding Buiding a $ri/ate Copy of the sip "odue............................................................................ *5
The 'I$ Command 0ine.................................................................................................................*# 2.12 'I$ 'pecification +ies...........................................................................................................*4
2.1& Aariae Aariae 6umers of (rguments....................................................... (rguments....................................................... ................................ .....*% 2.1* (dditiona (dditiona 'I$ Types.................................................. Types.................................................. ........................................................ ....*% 2.15 $ython ($I for (ppications................................................................................... (ppications................................................................................... ............... ........ .......51 51 Chapter &: pyTesser........................................................................................ pyTesser........................................................................................ ................................5# &.1 Introduction:.............................................................................................................................5# &.2 ependencies:..........................................................................................................................5# &.& Instaation:..............................................................................................................................5# &.* ;sage:......................................................................................................................................5# &.* +ie ependencies:...................................................................................................................53 &.5 $ython Image 0iraryy............................................................................................................53 &.5.1 Introduction ..........................................................................................................................53 &.5.2 Image (rchi/es ....................................................................................................................53 &.5.& Image ispay ......................................................................................................................53 &.5.* Image $rocessing .................................................................................................................53 &.5.5 ;sing the Image Cass .........................................................................................................54 Chapter *: Core $rogram $rogram 'ource Code.......................................................................... ................#* Importing key modues................................................................................................................#* ;I Impementation.......................................................................................................................#* +ie $icker Impementation........................................................................................ Impementation........................................................................................ ..................#5 OCR Con/ersions........................................................................................................................## "ain impementation...................................................................................................................## Caing "ain................................................................................................................................## Chapter 5: 0i/e -=ampe...............................................................................................................#3 Concusions....................................................................................................................................#3 #.1 Resuts......................................................................................................................................#3 #.2 Concusions on pyOCR............................................................................................................#4 #.& +uture ork.............................................................................................................................#4 References......................................................................................................................................3)
Chapter 1: Bac!r"#$% 1&1 I$tr"%#ct'"$ .e are mo'ing for+ard to a more digiti(ed +orld- Computer and PDA screens are replacing the traditional boo/s and ne+spapers- Also the large amount of paper archi'es +hich requires maintenance as paper decays o'er time lead to the idea of digiti(ing them instead of simply scanning them- 1his requires recognition soft+are that is capable in an ideal 'ersion of reading as +ell as humans- Such OCR soft+are is also needed for reading ban/ chec/s and postal addresses- Automating Automating these t+o tas/s can sa'e many hours of human +or/1hese t+o major trends lead OCR soft+are to be de'eloped and licensed to OCR contractors- 61here is one notable e,ception to this$ +hich is pyOCR open source OCR soft+are that +e ha'e de'eloped7 pyOCR +as created by us on April %#$ "#%& "# %& +ith the goal of o f pro'iding an open source OCR system capable of performing multiple digiti(ation functions- 1he application of this soft+a soft+are re ranged ranged from from general general des/to des/top p use and simple simple documen documentt con'ers con'ersion ion to histor historical ical document analysis and reading aids for 'isually impaired users-
1&( H'st"r) "* OCR 1he idea of OCR technology has been around for a long time and e'en predates electronic computers-
2igure %5 Statistical Machine Design by Paul .- 8andel
1his is an image of the original OCR design proposed by Paul .- 8andel in %9:%- 8e applied for a patent for a de'ice 6in +hich successi'e comparisons are made bet+een a character and a character image-7 - A photo3electric apparatus +ould be used to respond to a coincidence of a character and an image- 1his means you +ould shine a light through a filter and$ if the light matches up +ith the correct character of the filter$ enough light +ill come bac/ through the filter and trigger some acceptance mechanism for the corresponding character- 1his +as the first documented 'ision of this type of technology- 1he +orld has come a long +ay since this prototype-
1&(&1 Te+p,ate-.atch'$! .eth"% ;n %9<&$ %9<&$ !elner !elner and =lauber =lauberman man used used magnet magnetic ic shift shift regist registers ers to projec projectt t+o3di t+o3dimen mensio sional nal information- 1he reason for this is to reduce the comple,ity and ma/e it easier to interpret the information- A printed input character on paper is scanned by a photodetector through a slit- 1he refl reflect ected ed ligh lightt on the the input input pape paperr allo allo+s +s the the photo photodet detec ecto torr to segm segment ent the the char charac acte terr by calculating the proportion of the blac/ portion +ithin the slit- 1his proportion 'alue is sent to a regis register ter +hich +hich con'er con'erts ts the analog analog 'alues 'alues to digita digitall 'alues'alues- 1hese sample sampless +ould +ould then then be matched to a template by ta/ing the total sum of the differences bet+een each sampled 'alue and the corresponding template 'alue- .hile this machine +as not commerciali(ed$ it gi'es us import important ant insigh insightt into into the dimens dimension ionali ality ty of charact characters ers-- ;n essence essence$$ charact characters ers are t+o3 t+o3 dimensional$ and if +e +ant to reduce the dimension to one$ +e must change the shape of the character for the machine to recogni(e it-
2igure "5 ;llustration of "3D reduction to %3D by a slit- )a* An An input numeral 6>7 and a slit
scanned from left to right- )b* ?lac/ area projected onto a,is$ the scanning direction of the slit-
1&(&( /eeph",e .eth"% 1his 1his is the simple simplest st logical logical template template matchi matching ng method method-- Pi,els Pi,els from differ different ent (ones (ones of the binari(ed character are matched to template characters- An e,ample +ould be in the letter A$ +here a pi,el +ould be selected from the +hite hole in the center$ the blac/ section of the stem$ and then some others outside of the letter-
2igure :5 ;llustration of the peephole method-
@ach template character +ould ha'e its o+n mapping of these (ones that could be matched +ith the character that needs to be recogni(ed- 1he peephole method +as first e,ecuted +ith a program called @lectronic Reading Automation Automation in %9<-
2igure >5 1he Solartron @lectronic Reading Automaton 1his +as produced by Solartron @lectronics =roups Btd- and +as used on numbers printed from a cash register- ;t could read %"# characters per second$ +hich +as quite fast for its time$ and used %## peepholes to distinguish characters-
1&(&0 Str#ct#re% A$a,)s's .eth"% ;t is 'ery difficult to create a template for hand+ritten characters- 1he 'ariations +ould be too large to ha'e an accurate or functional template- 1his is +here the structure analysis method came into play- 1his method analy(es the character as a structure that can be bro/en do+n into parts- 1he features of these parts and the relationship bet+een them are then obser'ed to determine the correct character- 1he issue +ith this method is ho+ to choose these features and relationships to properly identify all of o f the different possible characters;f the peephole method is e,tended to the structured analysis method$ peepholes can be 'ie+ed on a larger scale- ;nstead of single pi,els$ +e can no+ loo/ at a slit or stro/e of pi,els and determine their relationship +ith other slits-
2igure <5 @,tension of the peephole method to structure analysis-
1his 1his techni technique que +as first first propos proposed ed in %9<> +ith +ith .illi .illiam am S- Rohland Rohlandss 6Charac 6Character ter Sensin Sensing g System7 patent using a single 'ertical scan- 1he features of the slits are the number of blac/ regions present in each slit- 1his is called the cross counting technique-
1&(& Fact"rs '$*,#e$c'$! OCR s"*t2are per*"r+a$ce
OCR results are mainly attributed to the OCR recogni(er soft+are$ but there are other factors that can ha'e a considerable inpact inpa ct on the results- 1he simplest of these factors can be the scanning technique and parameters1he table belo+ summari(es these factors and pro'ides reco mmendations for OCR scanning on historic ne+spapers and other o ld documents-
8
$rocess 'teps
+actors infuencing OCR
9uaity of origina Otain origina source
Recommended actions for historic ne,spapers ;se origina hard copies if udget ao,s digiti?ation costs ,i e consideray higher than for using microfimD Hard copies used for microfimingdigiti?ation shoud e the most compete and ceanest /ersion possie
source
'can fie
'can 'canni ning ng reso resou uti tion on and fie format
;se microfim created after estaishment and use of microfim imaging standards 1%%)@s or aterD ;se master negati/e microfim ony first generationD or origina copiesE no second generation copies. 'canning resoution shoud e &)) dpi or ao/e to capture as much imag imagee info inform rmat atio ion n as poss possi ie e +ie format to e ossess e.g. TI++ so that no image information pi=esD are ost. 'can the image as grayscae or i! tona.
Bit depth of
Image optimi?ation for OCR to
image increase contrast and density needs
Image to e carried out prior to OCR
Create good contrast
optimi?ation either in the scanning soft,are or a
et,een ack and ,hite
and customi?ed program.
in the fie Image
inari?ation
preprocessingD
If the images are grayscaeE
process con/ert them to image optimi?ed
9uaity of
source density of microfimD
i!tona inari?ationD. Otain est source uaity.
Check density of microfim efore scanning.
'ke,ed pages $ages ,ith
e!ske, pages in the image
compe= preprocessing step so that ,ord OCR soft,are ! 0ayout of page anay?ed and roken
ayouts (deuate
ines are hori?onta.
do,n
0ayout of pages and ,hite space
,hite space cannot e changedE ,ork ,ith et,een inesE ,hat you ha/e. coumns and at edge of page so that te,t boundaries can be identified •
OCR soft+are 3 Matching character edges to pattern images and ma/ing decision on +hat the character is OCR soft+are E Matching +hole +ords to dictionary and ma/ing decisions on confidence
•
•
•
•
•
1rain OCR engine
;mage optimi(ation Quality of source
•
OCR soft+are 3 Analy(ing stro/e edge of each character
Pattern image in OCR soft+are database Algorithms in OCR soft+are Algorithms and built in dictionaries in OCR soft+are Depends on ho+ much time you ha'e a'ailable to train OCR
Optimi(e image for OCR so that character edges are smoothed$ rounded$ sharpened$ contrast increased prior to OCRpossible • Obtain best source possible )mar/ed$ mouldy$ faded source$ characters not in sharp focus or s/e+ed on page negati'ely affects identification of characters*-
Select good OCR soft+are-
Select good OCR soft+are-
•
Purchase OCR soft+are that has this abilitypresent it is questionable if • At present training is 'iable for large scale historic ne+spaper projects
1able 1able %5 Potential methods of impro'ing OCR accuracy-
1&0 I$%epe$%e$t C"+p"$e$t A$a,)s's 1his is a method that +as de'eloped +ith the goal of finding a linear representation of nongaussian data so that the components are statistically independent- Data is nongaussian if it does not follo+ a normal distribution- 1he coc/tail party problem is a great e,ample of the need for a +ay to analy(e mi,ed data- ;n this problem$ there are t+o signal sources$ t+o people spea/ing at the same time$ and t+o sources$ microphones$ to collect this data- .e +ould li/e to be able to ta/e the mi,ed data of the t+o spea/ers collected from these t+o microphones and someho+ separate the data bac/ to their original signals- @ach microphone +ill ha'e a different representation of the mi,ed signal because they +ill be located in different positions in the room- ;f +e represent these mi,ed recorded signals as and +e could e,press this as a linear equation5 +here are parameters parameters that that depend on the distance distancess of the microphones microphones from from the spea/ers - 1his gi'es us the nongaussian data +e need to properly analy(e these signals in an effort to reali(e the original signals-
2igure &5 1he original signals-
2igure 5 1he obser'ed mi,ture of the source signals in 2ig- &-
;n order to properly e,ecute ;ndependent Component Analysis the data must go through some initial standardi(ation along +ith one fundamental condition5 nongaussianity- 1o 1o sho+ +hy =aussian 'ariables ma/e ;CA impossible$ +e assume +e ha'e an orthogonal mi,ing matri, and our sources are all gaussian- 1hen and are gaussian$ uncorrelated$ and of unit 'ariance- 1he e,pression for their joint density +ill be5 ()
1he distribution for this equation is sho+n in the follo+ing figure-
2igure 45 1he multi'ariate distribution of t+o independent gaussian 'ariables-
1he density density of this this distri distributi bution on is comple completel tely y symmet symmetric ric and does does not contai contain n any rele'ant information about directions of the columns of the mi,ing matri,- ?ecause there is no rele'ant information$ +e ha'e no +ay to ma/e estimates about this data - .e thus need a measure of nongaussianity$ this can be done do ne using /urtosis or n egentropy!urtosis is the older method of measuring nongaussianity and can be defined for as5 {
1his simplifies to { }
normali(ed fourth moment {
because
}
{ }
is of unit 'ariance 'ariance and can be interpr interpreted eted as the }- !urtosis is usually either positi'e or negati'e for nongaussian
random 'ariables- ;f /urtosis is (ero$ then the random 'ariable is =aussian- 2or this reason +e generally ta/e the absolute 'alue or the square of /urtosis as a measure of gaussianity ga ussianity-1he use of /urtosis has been commonly used in ;CA because of its simple formulation and its lo+ computational cost- 1he computation cost is in fact reduced +hen using the fourth moment
of the data as estimation for its /urtosis- 1his is due to the follo+ing linear properties5
Although /urtosis pro'ed to be 'ery handy for multiple applications$ it did ha'e one major +ea/nessF its sensiti'ity to outliers- 1his means that +hen using a sample data in +hich the distribution is either random or has some errors$ /urtosis can fail at determining its gaussianity1his lead to the de'elopment of another method called negentropyAs the name suggests negentropy is based on entropy measure +hich is a fundamental concept of information theory- @ntropy describes the amount of information that can be ta/en out of the obser'ation of a gi'en 'ariable- A large entropy 'alue means the data is random and unpredictable2or a discrete random 'ariable G$ G$ its entropy is e,pressed as follo+5 ∑
;n a similar manner the entropy of a continuous random 'ariable y can be e,pressed as5 ∫
;nformation theory established that out of all random 'ariables of equal 'ariance$ the =aussian 'ariable +ill ha'e the highest entropy 'alue +hich can also be attributed to the fact that =aussian distribution is the most random distribution1he 1he prec preced edent ent resu result lt sho+ sho+ss that that +e can can obtai obtain n a meas measur uree of gaus gaussi sian anit ity y thro through ugh differential entropy +hich is called negentropyn egentropy2or a 'ariable y +e define its negentropy as5 (
)
+here a =aussian =aussian random 'ariable 'ariable that has the same co'ariance co'ariance matri, matri, as the 'ariable 'ariable y-
Hegentropy is (ero if and only if y has a =aussian distribution$ thus the higher its measure the less less =aussi =aussian an the 'ariab 'ariable le is- 0nli/e 0nli/e /urtos /urtosis$ is$ negent negentrop ropy y is comput computati ational onally ly e,pensi e,pensi'e'e- A soluti solution on to this this proble problem m is to find find simple simplerr appro, appro,ima imati tions ons of its measur measuree- 1he classi classical cal appro,imation of negentropy +as de'eloped by in %94 by Iones and Sibson as follo+s5 { }
+ith the assumption that y has (ero mean and unit 'arianceA more robust appro,imation appro,imation de'eloped by 8y'Jrinen 8y'Jrinen ma/es use of nonquadratic nonquadratic functions functions as follo+s5 ∑ [{
}
{
}]
+here +here some some posi positi' ti'ee cons constan tans$ s$ ' the the normal normali(e i(ed d =aus =aussia sian n 'ari 'ariabl ablee and and
some some non non quadra quadratic tic
functionsA common use of this appro,imation is to ta/e only one quad ratic function =$ usually ()
and the appro,imation +ill then be in the form5 [{
}
{
}]
.e then ha'e obtained appro,imations that pro'ide computational simplicity comparable to the /urtosis measure along +ith the robustness of negentropy1o gi'e a brief e,planation on +hy gaussianity is strictly not allo+ed +e can say that it ma/es ma/es the the data data compl complet etel ely y symm symmet etri ricc and and thus thus the the mi,i mi,ing ng matr matri, i, +ill +ill not not pro'i pro'ide de any any information on the direction of its columnsAs mentioned abo'e$ data preprocessing is crucial in that it ma/es the ;CA estimation simpler and better conditioned- Many preprocessing techniques can then be applied such as 6Centering7 that consists in subtracting the mean 'ector of , []
so as to ma/e , a (ero3mean 'ariable and 6.hitening7 +hich is the linear transformation of the obser'ed 'ector , so that its components become uncorrelated and its 'ariances equal unity$ this
'ector is then said to be +hite-
1& E$er!)-base% ."%e,s *"r sparse "3erc"+p,ete represe$tat'"$s ;nitially there +ere t+o approaches to Binear Components Analysis5 1he Density Modeling Approach and the 2iltering approach- Density Modeling is based on causal generati'e models +hereas the 2iltering approach uses information ma,imi(ation techniques- @nergy based models emerged as a unification of these methods because it used Density Modeling techniques along +ith filtering techniques -
2igure 95 Approach diagram of Binear Component Analysis
@nergy based models associate an energy to configuration of rele'ant 'ariables in graphical models models$$ this this is a po+erf po+erful ul tool tool as it eliminat eliminates es the need for proper proper normal normali(a i(atio tion n of the probability distributions- 61he parameters of an energy3based model specify a deterministic mapping from an obser'ation 'ector to a feature 'ector and the feature 'ector determines a global energy$ 7 - Hote that the probability density function of , is e,pressed as5
+here K is a normali(ation 'ector-
1&4 F'$'te State Tra$s%#cers Tra$s%#cers '$ La$!#a!e a$% Speech /r"cess'$! 2inite State Machines are used in many areas of computational linguistics because of their con'enience and efficiency- 1hey do a great job at describing the important local phenomena encountered in empirical language study- 1hey tend to gi'e a good compact representation of le,ical rules$ idioms$ and clichLs +ithin a specific language2or computational computational linguistics$ linguistics$ +e are mainly concerned concerned +ith time and space efficiency efficiency-.e achie' achie'ee time time effi efficien ciency cy through through the use of a determ determini inist stic ic machin machinee- 1he output of a deterministic machine is usually linearly dependent on the si(e of the input- 1his fact alone allo+s us to consider it optimal for time efficiency- .e are able to achie'e space efficiency +ith classical minimi(ation algorithms for deterministic automata-
1&4&1 Se5#e$t'a, Tra$s%#cers 1his is an e,tension of the idea of deterministic automata +ith deterministic input- 1his type of transducer is able to produce output strings or +eights in addition to deterministically accepting input- 1his quality is 'ery useful and supports 'ery efficient programs-
1&4&( 6e'!hte% F'$'te State Tra$s%#cers 1he use of 2inite state automata contributed a lot to the de'elopment of speech recognition and of natural language processing- Such an automaton pro'ides a state transition depending on the input it recei'es until it reaches one of the final statesF the output state-
2igure %#5 Simple 2inite State Machine
Ho+adays in natural language processing the use of another type of finite state machines has become +idely spread$ these machines are the 1ransducers1ransducers1hese transducers /eep all the functionality of a simple 2SM )finite state machine* but add a +eight to each transition- ;n speech recognition for e,ample this +eight is the probability for each state transition- ;n addition$ in these transducers the input or output label of a transducer transition can be null- Such a null means that no symbol needs to be consumed or output during the transition- 1hese null labels are needed to create 'ariable length input and output strings1hey also pro'ide a good +ay of delaying the output 'ia an inner loop for e,ampleComposition is a common operation in the use of transducers- ;t pro'ides a +ay of combining different le'els of representation- A common application of this in speech recognition is the composition of a pronunciation le,icon +ith a +ord3le'el grammar to produce a phone3to3 +ord transducer +hose +ord sequences are restricted to the grammar -
2igure %%5 @,ample of transducer compositionco mposition-
1&4&0 Tra$s%#cers Tra$s%#cers '$ La$!#a!e ."%e,'$! ;nitial approaches to language modeling used affi, dictionaries to represent natural languages1his method came in handy to represent languages li/e @nglish by ha'ing a list of the most common +ords along +ith possible affi,es- 8o+e'er$ +hen trying to represent more languages$ it +as quic/ly clear that such an approach fails +ith agglutinati'e languagesAn agglutinati'e language is a language in +hich +ord roots change internally to form other nouns- 0nli/e the @nglish language in +hich +e generally add suffi,es to obtain other +ord forms li/e the suffi, Ely for ad'erbs- 8ungarian falls under the agglutinati'e languages for +hich +e needed to create a dictionary and a language model in 2S1 )finite state transducer* format- 1he representation of such a language can be done by 6ha'ing the last node of the portion of the 2S1$ +hich encodes a gi'en suffi,$ contain outgoing arcs to the first states of portions of the 2S1 +hich encode other suffi,es7 - 1he ad'antage of this technique is that +hen +h en applied to all the possible affi,es$ it +ill then ha'e a solid representation of the agglutination nature of the language-
1&7 I+a!e F',e F"r+ats 1here are many different file formatting options a'ailable for character recognition soft+are- .e primarily dealt +ith PH= files because it +as the only usable format in pyOCR but +e +ere faced +ith some challenges during image con'ersion- ;mage quality has a huge impact on the effecti'eness of any OCR soft+are and +hen trying to change bet+een formats$ one has to be a+are of lossy 's- lossless compression- 1hese +ere the formats +e ran into during this project5
1&7&1 TIFF 1his is a 1agged ;mage 2ile 2ormat and can be used as a single or multi image file format )multiple pages in the same file*- 1he 1;22 format is 'ery desirable because the most common compression schemes are all lossless- 1his means that these types of compression can reduce the file si(e )and later returned to their original si(e* +ithout losing any quality-
1&7&( /DF Personal Document 2ormat is currently an open source standard created by Adobe- .hile the ability for a PD2 to contain te,t and images is 'ery useful for some applications$ this is an unnecessarily$ robust quality that only adds to the file si(e- A 1;22 is much more desirable because it is can specifically only contain images-
1&7&0 /NG Portable Het+or/ =raphic formatting is a lossless data format and the one that is used by pyOCR- 1hey are a single image$ open$ color c olor image format and +ere created to replace the =;2 image format$ +hich only supported a ma,imum of "<& colors-
1&7& 8/EG 1he acronym IP@= comes comes from the founding founding company of the file format$ format$ Ioint Photographic Photographic @,perts =roup- 1his is a lossy image format but can be scaled to tradeoff bet+een storage si(e and image quality- 1his is not ideal for OCR soft+are$ but can be used as long as the data is ne'er compressed-
Chapter (: SI/ a$% /)9T I$tr"%#ct'"$ S;P is a tool for automatically generating Python bindings for C and C libraries- S;P +as originally de'eloped in %994 for PyQt P yQt 3 the Python bindings for the Qt =0; tool/it 3 but is suitable for generating bindings for any C or C library1his 'ersion of S;P generates bindings for Python '"-: or later$ including Python ':1here are many other similar tools a'ailable- One o f the original such tools is S.;= and$ in fact$ S;P is so called because it started out as a small S.;=- 0nli/e S.;=$ S;P is specifically designed for bringing together Python and CNC and goes to great lengths to ma/e the integration as tight as possible1he homepage for S;P is http5NN+++-ri'erban/computing-comNsoft+areNsiphttp5NN+++-ri'erban/computing-comNsoft+areNsip- 8ere you +ill al+ays find the latest stable 'ersion and the latest 'ersion of this d ocumentationS;P can also be do+nloaded from the Mercurial repository at http5NN+++-ri'erban/computing-comNhgNsip-http5NN+++-ri'erban/computing-comNhgNsip
(&1 L'ce$se S;P is licensed under similar terms as Python Py thon itself- S;P is also licensed under the =PB )both '" and ':*- ;t is your choice as to +hich license you use- ;f you choose the =PB then any bindings you create must be distributed under the terms of the =PB-
(&( Feat#res S;P$ and the bindings it produces$ ha'e the follo+ing features5 •
•
•
•
•
•
•
•
•
bindings are fast to load and minimise memory consumption especially +hen only a small sub3set of a large library is being used automatic con'ersion bet+een standard Python and CNC data types o'erloading of functions and methods +ith different argument signatures support for Pythons /ey+ord argument synta, support for both e,plicitly specified and automatically generated docstrings access to a C classs protected methods the ability to define a Python class that is a sub3class of a C class$ including abstract C classes Python sub3classes can implement the dtor)* method +hich +ill be called from the C classs 'irtual destructor support for ordinary C functions$ class methods$ static class methods$ 'irtual class methods and abstract class methods
•
•
•
•
•
• •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
the ability to re3implement C 'irtual and abstract methods in Python support for global and class 'ariables support for global and class operators support for C namespaces support for C templates support for C e,ceptions and +rapping them as Python e,ceptions the automatic generation of complementary rich compa rison slots support for deprecation +arnings the ability to define mappings bet+een C classes and similar Python data types that are automatically in'o/ed the ability to automatically e,ploit any a'ailable run time type information to ensure that the class of a Python instance object matches the class of the corresponding C instance the ability to change the type and meta3type of the Python object used to +rap a CNC data type full support of the Python global interpreter loc/$ including the ability to specify that a C function of method may bloc/$ therefore allo+ing the loc/ to be released and other Python threads to run support for consolidated modules +here the generated +rapper code for a number of related modules may be included in a single$ possibly pri'ate$ module support for the concept of o+nership of a C instance )i-e- +hat part of the code cod e is responsible for calling the instances destructor* destructor* and ho+ the o+nership o+ nership may change during the e,ecution of an application the ability to generate bindings for a C class library that itself is built on another C class library +hich also has had bindings generated so that the different bindings integrate and share code properly a sophisticated 'ersioning system that allo+s the full lifetime of a C class library$ including any platform specific or optional features$ to be described in a single set of specification files support for the automatic generation of P@P >4> type hint stub files the ability to include documentation in the specification files +hich can be e,tracted and subsequently processed by e,ternal tools the ability to include copyright notices and licensing information in the specification files that is automatically included in all generated source code a build system$ +ritten in Python$ that you can e,tend to configure$ compile co mpile and install your o+n bindings +ithout +orrying about platform specific issues support for building your e,tensions using distutils S;P$ and the bindings it produces$ runs under 0H;$ Binu,$ .indo+s$ MacOSN$ Android and iOS-
(&0 SI/ C"+p"$e$ts S;P comprises a number of different components•
•
1he S;P code generator )s'p*- 1his processes -sip specification files and generates C or C bindings- ;t is co'ered in detail in 0sing S;P1he S;P header file )sip-h*- 1his contains definitions and data structures needed by the generated C and C code-
•
•
•
1he S;P module )sip-so or sip-pyd*- 1his is a Python e,tension module that is imported automatically by S;P generated bindings and pro'ides them +ith some common utility functions- See alsoPython AP; for Applications1he S;P build system )sipconfig-py*- 1his is a pure Python module that is created +h en S;P is configured and encapsulates all the necessary information about your system including rele'ant directory names$ compiler and lin/er flags$ and 'ersion numbers- ;t also includes se'eral Python classes and functions +hich help you +rite configuration scripts for your o+n bindings- ;t is co'ered in detail in 1he ?uild System1he S;P distutils e,tension )sipdistutils-py*- 1his is a distutils e,tension that can be used to build your e,tension modules using distutils and is an alternati'e to +riting configuration scripts +ith the S;P build system- 1his can be as simple as adding your -sip files to the list of files needed to build the e,ten sion module- ;t is co'ered in detail in ?uilding Gour @,tension +ith distutils-
(& /repar'$! *"r SI/ 34 1he synta, of a S;P specification file +ill change in S;P '<- 1he command line options to the S;P code generator +ill also change- ;n order to help users manage the transition the follo+ing approach +ill be adopted•
•
.here possible$ all incompatible changes +ill be first implemented in S;P '>.hen an incompatible change is implemented$ the old synta, +ill be deprecated )+ith a +arning message* but +ill be supported for the lifetime of '>-
(&4 9t S#pp"rt S;P has specific support for the creation cre ation of bindings based on Digias Qt tool/it1he S;P code generator understands the signalNslot type safe callbac/ mechanism that Qt uses to connect objects together- 1his allo+s applications to define ne+ Python signals$ and allo+s any Python callable object to be used as a slotS;P itself does not require Qt to be installed-
(&7 I$sta,,at'"$ (&7&1 D"2$,"a%'$! Gou Gou can get the latest release of the S;P source code from http5NN+++-ri'erban/computing-comNsoft+areNsipNdo+nloadhttp5NN+++-ri'erban/computing-comNsoft+areNsipNdo+nloadS;P is also included +ith all of the major Binu, distributions- 8o+e'er$ it may be a 'ersion or t+o out of date-
(&7&( C"$*'!#r'$! After unpac/ing the source pac/age )either a -tar-g( or a -(ip file depending on you r platform* you should then chec/ for any R@ADM@ files that relate to your platform He,t you need to configure S;P by e,ecuting the c"$*'!#re&p) script- 2or e,ample5 python configure-py configure-py
1his assumes that the Python interpreter is on your path- Something li/e the follo+ing may be appropriate on .indo+s5 .indo+s5 c5python:<python configure-py
;f you ha'e multiple 'ersions of Python installed then ma/e sure you use the interpreter for +hich you +ish S;P to generate bindings for1he full set of command line options is5 33'ersion Display the S;P 'ersion number3h$ 33help Display a help message33arch ARC8 ?inaries for the MacOSN architecture ARC8 +ill be built- 1his option should be gi'en once for each architecture to be built- Specifying more than one architecture +ill cause a uni'ersal binary to be created3b D;R$ 33bindir D;R 1he S;P code generator +ill be installed in the directory D;R33configuration 2;B@ He+ in 'ersion >-%&2;B@ contains the configuration of the S;P build to be used instead of dynamically introspecting the system and is typically used +hen cross3compiling- See Configuring +ith Configuration 2iles3d D;R$ 33destdir D;R 1he sip module +ill be installed in the directory D;R33deployment3target @RS;OH He+ in 'ersion >-%"-%@ach generated Ma/efile Ma/ efile +ill set the MACOSD@PBOGM@H11AR=@1 en'ironment 'ariable to @RS;OH- ;n order to +or/ around bugs in some 'ersions of Python$ this should be used instead of setting the en'ironment 'ariable in the shell3e D;R$ 33incdir D;R 1he S;P header file +ill be installed in the directory D;R3/$ 33static 1he sip module +ill be built as a static library- 1his is useful +hen building the sip module as a Python builtin3n$ 33uni'ersal 1he S;P code generator and module +ill be built as uni'ersal binaries under MacOSN- ;f the 33arch option has not been specified then the uni'ersal binary +ill include the i:4& and ppc architectures-
33no3pyi He+ in 'ersion >-%41his disables the installation of the sip-pyi type hints stub file33no3tools He+ in 'ersion >-%&1he S;P code generator and sipconfig module +ill not be installed3p PBA12ORM$ 33platform PBA12ORM @,plicitly specify the platformNcompiler to be used by the build system$ other+ise a platform specific default +ill be used- 1he 33sho+3platforms 33sho+3platforms option +ill display all the supported platformNcompilers33pyi3dir D;R He+ in 'ersion >-%4D;R is the name of the directory +here the sip-pyi type hints stub file is installed- ?y default this is the directory +here the sip module is installed3s SD!$ 33sd/ SD! ;f the 33uni'ersal option +as gi'en then this specifies the name of the SD! directory- ;f a path is not gi'en then it is assumed to be a sub3directory ofNApplicationsNcode-appNContentsNDe'eloperNPlatformsNMacOS-platformNDe'eloperNS D!s or NDe'eloperNSD!s3u$ 33debug 1he sip module +ill be built +ith debugging symbols3' D;R$ 33sipdir D;R ?y default -sip files +ill be installed in the directory D;R33sho+3platforms 1he list of all supported platformNcompilers +ill be displayed33sho+3build3macros 1he list of all a'ailable build macros +ill be displayed33sip3module HAM@ 1he sip module +ill be created +ith the name HAM@ rather than the default sip- HAM@ may be of the form pac/age-sub3pac/age-module- See ?uilding a Pri'ate Copy of the sip Module for ho+ to use this to create a pri'ate copy of the sip module33sysroot D;R He+ in 'ersion >-%&D;R is the name of an optional o ptional directory that replaces sys-prefi, in the names of o ther directories )specifically those specifying +here the 'arious S;P components +ill be installed and +here the Python include directories can be found*- ;t is typically used +hen cross3compiling or +hen building a static 'ersion of S;P- See Configuring +ith Configuration 2iles33target3py3'ersion 33target3py3'ersion @RS;OH He+ in 'ersion >-%&@RS;OH is the major and minor 'ersion )e-g- : ->* of the 'ersion of Python being targetted- ?y default the 'ersion of Python being used to run the c"$*'!#re&p) script is used- ;t is typically used +hen cross3compiling- See Co nfiguring +ith Configuration 2iles33use3qma/e He+ in 'ersion >-%& Hormally the c"$*'!#re&p) script uses S;Ps o+n build system to create the Ma/efiles for the code generator and module- 1his option causes project files )-pro files* used by
Qts 5+ae program to be generated instead- 5+ae should then be run to generate the Ma/efiles- 1his is particularly useful +hen cross3compiling1he c"$*'!#re&p) script ta/es many other options that allo+s the build system to be finely tuned1hese are of the form nameT'alue or nameT'alue- 1he 33sho+3build3macros option +ill display each supported name$ although not all are applicable to all platforms1he nameT'alue form means that 'alue +ill replace the e,isting 'alue of name1he nameT'alue form means that 'alue +ill be appended to the e,isting 'alue of name2or e,ample$ the follo+ing +ill disable support for C e,ceptions )and so reduce the si(e of module binaries* +hen used +ith =CC5 python configure-py configure-py C2BA=ST3fno3e,ce C2BA=ST3fno3e,ceptions ptions
A pure Python module called sipconfig-py is generated by c"$*'!#re&p) - 1his defines each name and its corresponding 'alue- Boo/ing at it +ill gi'e you a good idea of ho+ the build system uses the different options- ;t is co'ered in detail in 1he ?uild System-
(&7&0 B#',%'$! 1he ne,t step is to build S;P by running your yo ur platforms ma/e command- 2or e,ample5 ma/e
1he final step is to install S;P by running the follo+ing command5 ma/e install
)Depending on your system you may require root or administrator pri'ileges-* 1his +ill install the 'arious S;P components-
(&7& C"$*'!#r'$! 2'th C"$*'!#rat'"$ F',es 1he c"$*'!#re&p) script normally introspects the Python installation of the interpreter running it in order to determine the names of the 'arious files and directories it needs- 1his is fine for a nati'e build of S;P but isnt appropriate +hen cross3compiling- ;n this case it is possible to supply a configuration file$ specified using the 33configuration option$ +hich contains definitions of all the required 'alues1he format of a configuration file is as follo+s5 a configuration item is a single line containing a nameN'alue pair separated by T a 'alue may include another 'alue by embedding the name of that 'alue surrounded by U ) and * comments begin +ith V and continue to the end of the line blan/ lines are ignoredc"$*'!#re&p) pro'ides the follo+ing preset 'alues for a configuration5 •
•
•
•
pymajor is the major 'ersion number of the target Python installation pyminor
is the minor 'ersion number of the target Python installationsysroot is the name of the system root directory- 1his 1his is specified +ith the 33sysroot option1he follo+ing is an e,ample configuration file5
V 1he target Python installation pyplatform T linu, pyincdir T U)sysroot*NusrNincludeNpythonU)pymaj U)sysroot*NusrNincludeNpythonU)pymajor*U)pyminor* or*U)pyminor* V .here S;P +ill be installedsipbindir T U)sysroot*NusrNbin sipmoduledir T U)sysroot*NusrNlibNpython U)sysroot*NusrNlibNpythonU)pymajor*Ndist3pac/ages U)pymajor*Ndist3pac/ages 1he follo+ing 'alues can be specified in the configuration file5 pyplatform is the target Python platform pyincdir is the target Python include directory containing the Python-h file pyconfincdir is the target Python include directory containing the pyconfig-h file- ;f this isnt specified then it defaults to the 'alue of pyincdir pypylibdir is the target Python library directorysipbindir is the name of the target directory +here the S;P code generator +ill be installed- ;t can be o'erridden by the 33bindir optionsipincdir is the name of the target directory +here the sip-h file +ill be installed- ;f this isnt specified then it defaults to the 'alue of pyincdir- ;t can be o'erridden o' erridden by the 33 incdir optionsipmoduledir is the target directory +here the sip module +ill be installed- ;t can be o'erridden by the 33 destdir optionsipsipdir is the name of the target directory +here generated gen erated -sip files +ill be installed by default- ;t is only used +hen creating the sipconfig module- ;t can be o'erridden by the 33 sipdir option-
(&; Us'$! SI/ ?indings are generated by the S;P code generator from a number of specification files$ typically +ith a -sip e,tension- Specification files loo/ 'ery similar to C and C header files$ but often +ith additional information )in the form of a directive or directive or an annotation* annotation* and code so that the bindings generated can be finely tuned-
(&;&1 A S'+p,e C<< E=a+p,e .e start start +ith a simple e,ample- Bets say you ha'e a )fictional* C library that implements a
single class called .ord.ord- 1he class has one constructor that ta/es a # terminated character string as its single argument- 1he class has one method called re'erse)* +hich ta/es no arguments and returns a # terminated character string- 1he interface to the class is defined in a header file called +ord-h +hich might loo/ something li/e this5
NN Define the interface to the +ord libraryclass .ord W const char Xthe+ordF public5 .ord)const .ord)const char X+*F char Xre'erse)* constF YF 1he corresponding S;P specification file +ould then loo/ something li/e this5 NN Define the S;P +rapper to the +ord libraryUModule +ord class .ord W U1ype8eaderCode Vinclude +ord-h U@nd public5 .ord)const .ord)const char X+*F char Xre'erse)* constF YF Ob'iously a S;P specification file loo/s 'ery much li/e a C )or C* head er file$ but S;P does not include a full C parser- Bets Bets loo/ at the differences bet+een the t+o files1he UModule directi'e has been added a dded Z%[- 1his is used to name the Python module that is being created$ +ord in this e,ample1he U1ype8eaderCode directi'e has been added- 1he te,t bet+een this and the follo+ing U@nd directi'e is included literally in the code that S;P generates- Hormally it is used$ as in this case$ to Vinclude the corresponding C )or C* header file Z"[1he declaration of the pri'ate 'ariable this+ord has been remo'ed- S;P does not support access to either pri'ate or protected instance 'ariables;f +e +ant to +e can no+ generate the C code in the current directory by running the •
•
•
follo+ing command5
sip 3c - +ord-sip 8o+e'er$ that still lea'es us +ith the tas/ of compiling the generated code and lin/ing it against all the necessary libraries- ;ts much easier to use the S;P build system to do the +hole thing0sing the S;P build system is simply a matter of +riting a small Python script- ;n this simple e,ample +e +ill assume that the +ord library +e are +rapping and its header file are installed in standard system locations and +ill be found by the compiler and lin/er +ithout ha'ing to specify any additional flags- ;n a more realistic e,ample yo ur Python script may ta/e command line options$ or search a set of directories to deal +ith different configurations and installations1his is the simplest script )con'entionally called configure-py*5
import os import sipconfig V 1he name of the S;P build file generated by S;P and used by the build V system buildfile T \+ord-sbf\ \+ord-sbf\ V =et the S;P configuration informationconfig T sipconfig-Configuration)* sipconfig-Configuration)* V Run S;P to generate the codeos-system)\ \-join)Zconfig-sipbin$ \3c\$ \-\$ \3b\$ buildfile$ \+ord-sip\[** V Create the Ma/efilema/efile T sipconfig-S;PModuleMa/efile sipconfig-S;PModuleMa/efile)config$ )config$ buildfile* V Add Add the library +e are are +rapping- 1he name doesn]t include include any platform V specific prefi,es or e,tensions )e-g- the \lib\ prefi, on 0H;$ or the V \-dll\ e,tension on .indo+s*ma/efile-e,tralibs T Z\+ord\[ V =enerate the Ma/efile itselfma/efile-generate)* 8opefully this script is self3documenting- 1he /ey parts are the Configuration and S;PModuleMa/efile classes- 1he build system contains other Ma/efile classes$ for e,ample to build programs or to call other Ma/efiles in sub3directoriesAfter running the script )using the Python interpreter the e,tension module is being created for* the generated C code and a nd Ma/efile +ill be in the current directory-
1o compile compile and install the e,tension module$ just run the follo+ing commands Z:[5 ma/e ma/e install
1hats all there is to itSee ?uilding Gour Gour @,tension +ith distutils d istutils for an e,ample of ho+ to build this e,ample using distutilsZ%[ All S;P directi'es start +ith a U as the first first non3+hitespace character of a lineZ"[ S;P includes includes many code directi'es li/e thisthis- 1hey differ in +here the supplied code is placed by S;P in the generated codeZ:[ On .indo+s .indo+s you might might run nma/e nma/e or ming+:"3ma/e ming+:"3ma/e insteadinstead-
(&;&( A ."re C"+p,e= C<< E=a+p,e ;n this last e,ample +e +ill +rap a fictional C library that contains a class that is deri'ed from a Qt class- 1his +ill demonstrate ho+ S;P allo+s a class hierarchy to be split across multiple Python e,tension modules$ and +ill introduce S;Ps 'ersioning system1he library contains a single C class called 8ello +hich is deri'ed from Qts QBabel class- ;t beha'es just li/e QBabel e,cept that the te,t in the label is hard coded to be 8ello .orld.orld- 1o ma/e the e,ample more interesting +ell also say that the library only supports Qt '>-" and later$ and also includes a function called setDefault)* that is not implemented in the .indo+s 'ersion of the library1he hello-h header file loo/s something li/e this5
NN Define the interface to the hello libraryVinclude qlabel-h Vinclude q+idget-h Vinclude qstring-h class 8ello 5 public QBabel W NN 1his is needed by the Qt Meta3Object Compiler QO?I@C1 public5 8ello)Q.idget 8ello)Q.idget Xparent T #*F pri'ate5 NN Pre'ent instances from being copied8ello)const 8ello ^*F
8ello ^operatorT)const 8ello ^*F YF Vif _defined)QOS.;H* 'oid setDefault)const QString ^def*F Vendif 1he corresponding S;P specification file +ould then loo/ something li/e this5
NN Define the S;P +rapper to the hello libraryUModule hello U;mport Qt=uiNQt=uimod-sip Qt=uiNQt=uimod-sip U;f )Qt>"# 3* class 8ello 5 public QBabel W U1ype8eaderCode Vinclude hello-h U@nd public5 8ello)Q.idget 8ello)Q.idget Xparent N1ransfer1hisN T #*F pri'ate5 8ello)const 8ello ^*F YF U;f )_.S.;H* 'oid setDefault)const QString ^def*F U@nd U@nd Again +e loo/ at the differences$ but +ell s/ip those that +e'e loo/ed at in pre'ious e,amples•
•
1he U;mport directi'e has been added to specify that +e are e,tending the class hierarchy defined in the file Qt=uiNQt=uimod-sip- 1his file is part of PyQt>- 1he build system +ill ta/e care of finding the files e,act location1he U;f directi'e has been added to specify that e'erything Z>[ up to the matching U@nd directi'e only applies to Qt '>-" and later- Qt>"# is a tag defined defined in QtCoremod-sip Z<[using the U1imeline directi'e- U1imeline U1imeline is used to define a tag for each 'ersion of a librarys AP; AP; you are +rapping allo+ing you to maintain all the
different 'ersions in a single S;P specification- 1he b uild system pro'ides support to configure-py scripts for +or/ing out the correct tags to use according to +hich 'ersion of the library is actually installed1he 1ransfer1his annotation has been added to the constructors argumentargument- ;t specifies that if the argument is not # )i-e- the 8ello instance being constructed has a parent* then o+nership of the instance is transferred from Python to C- ;t is needed because Qt maintains objects )i-e- instances deri'ed from the QObject class* in a hierachy- .hen .hen an object is destroyed all of its children are also automatically destroyed- ;t is important$ therefore$ that the Python garbage collector doesnt also try and destroy them- 1his is co'ered in more detail inO+nership of Objects- S;P pro'ides many other annotations that can be applied to arguments$ functions and classes- Multiple annotations are separated by commas- Annotations may ha'e 'alues1he T operator has been remo'ed- 1his operator is not supported by S;P1he U;f directi'e has been added to specify that e'erything up to the matching U@nd directi'e does not apply to .indo+s.indo+s- .S.;H is another tag defined by by PyQt>$ this time using theUPlatforms directi'e- 1ags 1ags defined by the UPlatforms directi'e are mutually e,clusi'e$ i-e- only one may be 'alid at a time Z&[One question you might ha'e at this point is +hy bother to define the pri'ate copy constructor •
• •
+hen it can ne'er be called from Python` 1he ans+er is to pre'ent the automatic generation of a public copy constructor.e no+ no+ loo/ at the configure-py script- 1his is a little different to the script in the pre'ious e,amples for t+o related reasons2irstly$ 2irstly$ PyQt> includes a pure Python module called pyqtconfig that e,tends the S;P build system for modules$ li/e our e,ample$ that build on top of PyQt>- ;t deals +ith the details of +hich 'ersion of Qt is being used )i-e- it determines +hat the correct tags are* and +here it is installed1his is called a modules configuration moduleSecondly$ +e generate a configuration module )called helloconfig* for our o+n hello module1here is no need to do this$ but if there is a chance that somebody else might +ant to e,tend your C library then it +ould ma/e life easier for them Ho+ +e ha'e t+o scripts- 2irst the configure-py script5
import os import sipconfig from PyQt> import pyqtconfig V 1he name of the S;P build file generated by S;P and used by the build V system buildfile T \hello-sbf\ \hello-sbf\ V =et the PyQt> configuration informationconfig T pyqtconfig-Configuration)* pyqtconfig-Configuration)*
V =et the e,tra S;P flags needed by the imported imported PyQt> modules- Hote that V this normally only includes those flags )3, and 3t* that relate to S;P]s V 'ersioning system pyqtsipflags T config-pyqtsipflags config-pyqtsipflags V Run S;P to to generate the code- Hote that +e tell tell S;P +here to find find the qt V module]s specification files using the 3; flagos-system)\ \-join)Zconfig-sipbin$ \3c\$ \-\$ \3b\$ buildfile$ \3;\$ config-pyqtsipdir$ config-pyqtsipdir$ pyqtsipflags$ \hello-sip\[** V .e are are going to install the S;P specification file for this module and V its configuration moduleinstalls T Z[ installs-append)Z\hello-sip\$ installs-append)Z\hello-sip\$ os-path-join)config-defaultsipdir$ os-path-join)config-defaultsipdir$ \hello\*[* installs-append)Z\helloconfig-py\$ installs-append)Z\helloconfig-py\$ config-defaultmoddir[* config-defaultmoddir[* V Create the Ma/efile- 1he Qt=uiModuleMa/efile Qt=uiModuleMa/efile class pro'ided by the V pyqtconfig module ta/es care of all the e,tra preprocessor$ compiler and V lin/er flags needed by the Qt libraryma/efile T pyqtconfig-Qt=uiModuleMa/efile) pyqtconfig-Qt=uiModuleMa/efile) configurationTconfig$ buildfileTbuildfile$ installsTinstalls * V Add Add the library +e are are +rapping- 1he name doesn]t include include any platform V specific prefi,es or e,tensions )e-g- the \lib\ prefi, on 0H;$ or the V \-dll\ e,tension on .indo+s*ma/efile-e,tralibs T Z\hello\[ V =enerate the Ma/efile itselfma/efile-generate)* V Ho+ +e create the the configuration module- 1his is done by merging merging a Python V dictionary )+hose 'alues are normally determined dynamically* +ith a V )static* templatecontent T W V Publish +here the S;P specifications for this module module +ill be V installed-
\hellosipdir\5
config-defaultsipdir$
V Publish the set of of S;P flags needed by this module- As these are the the V same flags needed by the qt module +e could lea'e it it out$ but this this V allo+s allo+s us to change change the the flags at a later date date +ithout +ithout brea/ing V scripts that import the configuration module\hellosipflags\5 pyqtsipflags Y V 1his creates the helloconfig-py module from the helloconfig-py-in helloconfig-py-in V template and the dictionarydictionarysipconfig-createconfigmodule)\helloconfig-py sipconfig-createconfig module)\helloconfig-py\$ \$ \helloconfig-py-in\$ \helloconfig-py-in\$ content* He,t +e ha'e the the helloconfig-py-in helloconfig-py-in template script5 from PyQt> import pyqtconfig V 1hese are installation specific 'alues created +hen 8ello +as configuredV 1he follo+ing line +ill be replaced +hen this template is used to create V the final configuration moduleV S;PCOH2;=0RA1;OH class Configuration)pyqtconfig-Configuration*5 Configuration)pyqtconfig-Configuration*5 \\\1he class that represents 8ello configuration 'alues\\\ def init)self$ subcfgTHone*5 \\\;nitialise an instance of the class-
subcfg is the list of sub3class configurations- ;t should be Hone +hen called normally\\\ V 1his is all standard code to be copied 'erbatim e,cept for the V name of the module containing the super3classif subcfg5 cfg T subcfg else5 cfg T Z[ cfg-append)p/gconfig* pyqtconfig-Configuration-init)self$ pyqtconfig-Configuration-init)self$ cfg*
class 8elloModuleMa/efile)pyqtconfig-Qt=uiM 8elloModuleMa/efile)pyqtconfig-Qt=uiModuleMa/efile*5 oduleMa/efile*5
\\\1he Ma/efile class for modules that U;mport hello\\\ def finalise)self*5 \\\2inalise the macros\\\ V Ma/e sure our C library is lin/ed self-e,tralibs-append)\hello\*
V Bet the super3class do +hat it needs topyqtconfig-Qt=uiModuleMa/efile-finalise)self*
Again$ +e hope that the scripts are self documentingZ>[ Some parts of a S;P specifi specificatio cation n arent subject subject to 'ersion 'ersion controlcontrolZ<[ Actually in 'ersions-sip- PyQt> uses the U;nclude directi'e to split split the S;P specification for Qt across a large number of separate -sip filesZ&[ 1ags 1ags can also be defined by the the U2eature directi'e- 1hese tags tags are not mutually e,clusi'e$ i-ei-eany number may be 'alid at a time-
(&;&0 O2$ersh'p "* Ob>ects .hen a C instance is +rapped a corresponding Python object is created- 1he P ython object beha'es as you +ould e,pect in regard to garbage collection 3 it is garbage collected +hen its reference count reaches (ero- .hat then happ ens to the corresponding C instance` 1he ob'ious ans+er might be that the instances destructor is called- 8o+e'er the library AP; may say that +hen the instance is passed to a particular function$ the library ta/es o+nership of the instance$ i-e- responsibility for calling the instances destructor is transferred from the S;P generated module to the libraryO+nership of an instance may also be associated +ith another instance- 1he implication being that the o+ned instance +ill automatically be destroyed if the o+ning instance is destroyed- S;P /eeps trac/ of these relationships to ensure that Pythons cyclic garbage collector can detect and brea/ any reference cycles bet+een the o+ning and o+ned instances- 1he association is implemented as the o+ning instance ta/ing a reference to the o+ned instance1he 1ransfer1his$ 1ransfer 1ransfer and 1ransfer?ac/ annotations are used to specify +here$ and it +hat direction$ transfers of o+nership happen- ;t is 'ery important that these are specified correctly to
a'oid crashes )+here both Python and C call the destructor* and memory lea/s )+here neither Python and C call the destructor*de structor*1his applies equally to C structures +here the structure is returned to the heap using the free)* function-
(&;& T)pes a$% .eta-t)pes @'ery Python object )+ith the e,ception of the object object itself* has a meta3type and at least one super3type- ?y default an objects meta3type is the meta3type of its first super3typeS;P implements t+o super3types$ sip-simple+rapper and sip-+rapper$ and a meta3 type$ sip-+rappertypesip-simple+rapper is the super3type of sip-+rapper- 1he super3type of sip-simple+rapper is objectsip-+rappertype is the meta3type of both sip-simple+rapper and sip-+rapper- 1he super3type of sip-+rappertype is typesip-+rapper supports the concept of object o+nership described in O+nership of Objects and$ by default$ is the super3type of all the types that S;P generatessip-simple+rapper does not support the concept of object o+nership but S;P generated types that are sub3classed from it ha'e Python objects that ta/e less memoryS;P allo+s a classs meta3type and super3type to be e,plicitly specified using the Metatype and Supertype class annotationsS;P also allo+s the default meta3type and super3type to be changed for a module using the UDefaultMetatype and UDefaultSupertype directi'es- 0nli/e the default super3type$ the default meta3type is inherited by importing modules;f you +ant to use your o+n meta3type or super3type then they must be sub3classed from one of the S;P pro'ided types- Gour Gour types must be registered using sipRegisterPy1ype)*sipRegisterPy1ype)*- 1his is normally done in code specified using the U;nitialisationCode directi'eAs an e,ample$ PyQt> uses UDefaultMetatype to specify a ne+ meta3type that handles the interaction +ith Qts o+n meta3type system- ;t also uses UDefaultSupertype to specify that the smallersip-simple+rapper super3type super3type is normally used- 2inally it uses Supertype as an annotation of the QObject class to o'erride the default an d use sip-+rapper as the super3type so that the parentNchild relationships of QObject instances are properly maintainedmaintained-
(&;&4 La?) T)pe Attr'b#tes ;nstead of populating a +rapped types dictionary +ith its attributes )or descriptors for those
attributes* S;P only creates objects for those attributes +hen they are actually needed- 1his is done to reduce the memory footprint and start up time +hen used to +rap large libraries +ith hundreds of classes and tens of thousands of attributesS;P allo+s you to e,tend the handling of la(y attributes to your o+n attribute types by allo+ing you to register an attribute getter handler )using sipRegisterAttribute=etter) sipRegisterAttribute=etter)****- 1his +ill be called ca lled just before a types dictionary is accessed for the first timetime-
(&@ S#pp"rt *"r /)th"$s B#**er I$ter*ace S;P supports Pythons Pythons buffer interface in that +hene'er CNC requires a char or char X type then any Python type that supports the buffer interface )including ordinary Python strings* can be used;f a buffer is made up of a number of segments then all but the first +ill be ignored-
(& S#pp"rt *"r 6'%e Characters S;P '>-& introduced support for +ide characters )i-e- the +chart type*- Pythons C AP; includes support for con'erting bet+een unicode objects and +ide character strings and arrays- .hen con'erting from a unicode object to +ide characters S;P creates the string or array on the heap )using memory allocated using sipMalloc)**- 1his then raises the problem of ho+ this memory is subsequently freed1he follo+ing describes ho+ S;P handles this memory in the different situations +here this is an issue•
•
•
.hen a +ide string or array is passed to a function or method then the memory is freed )using sip2ree)** after that function or method returns.hen a +ide string or array is returned from a 'irtual method then S;P does not free the memory until the ne,t time the method is called.hen an assignment is made to a +ide string or array instance 'ariable then S;P does not first free the instances current string or array-
(&1 The /)th"$ G,"ba, I$terpreter L"c L"c Pythons =lobal ;nterpretor Boc/ )=;B* must be acquired before calls can be made to the Python AP;- ;t should also be released +hen a potentially bloc/ing call to CNC library is made in order to allo+ other Python threads to be e,ecuted- ;n addition$ some CNC libraries may implement their o+n loc/ing strategies that conflict +ith the =;B causing application deadloc/s- S;P pro'ides +ays of specifying +hen the =;B is released and acquired to ensure that loc/ing problems can be a'oidedS;P al+ays ensures that the =;B is acquired before ma/ing calls to the Python AP;- ?y default S;P does not release the =;B +hen ma/ing calls to the CNC library being +rapped1he Release=;B annotation can be used to o'erride this beha'iour +hen required-
;f S;P is gi'en the 3g command line option then the default beha'iour is changed and S;P releases the =;B e'ery time is ma/es calls to the CNC library being +rapped- 1he 8old=;B annotation can be used to o'erride this beha'iour +hen required-
(&11 B#',%'$! B#',%'$! a /r'3ate C"p) "* the s'p ."%#,e He+ in 'ersion >-%"1he sip module is intended to be be used by all the S;P generated modules of a particular Python installation- 2or e,ample PyQt: and PyQt> are completely independent of each other but +ill use the same sipmodule- 8o+e'er$ this means that all the generated modules must be built against a compatible 'ersion of S;P- ;f you do not ha'e complete control o'er the Python installation then this may be difficult or e'en impossible to achie'e1o get around this problem you can build a pri'ate copy of the sip module that has a different name andNor is placed in a different Python pac/age- 1o do this you use the 33sip3module option to specify the name )optionally including a pac/age name* of your pri'ate copyAs +ell as building the pri'ate copy of the module$ the 'ersion of the sip-h header file +ill also be specific to the pri'ate copy- Gou +ill probably probably also +ant to use the 33incdir option to specify the directory +here the header file +ill be installed to a'oid o'er+riting a copy of the default 'ersion that might already be installed.hen building your generated modules you must ensure that they Vinclude the pri'ate copy of sip-h instead of any default 'ersion-
1he S;P Command Bine 1he synta, of the S;P command line is5
sip Zoptions[ Zspecification[ specification is the name of the specification file for the module- ;f it is omitted then stdin is used1he full set of command line options is5 3h Display a help message3 Display the S;P 'ersion number3a 2;B@ Deprecated since 'ersion >-%41he name of the QScintilla AP; file to generate- 1his file contains a description of the module AP; in a form that the QScintilla editor component can use for auto3completion and call tips- )1he file may also be used by the Sci1@ editor but must be sorted first-* ?y default the file is not generated-
3b 2;B@ 1he name of the build file to generate- 1his file contains the information about the module needed by the S;P build system to generate a platform and compiler specific Ma/efile for the module- ?y default the file is not generated3? 1A= He+ in 'ersion >-%&1he tag is added to the list of backstopsbackstops- 1he option may be gi'en more than once if multiple timelines ha'e been defined- See the U1imeline directi'e for more details3c D;R 1he name of the directory )+hich must e,ist* into +hich all of the generated C or C code is placed- ?y default no code is generated3d 2;B@ Deprecated since 'ersion >-%"5 0se the 3 option instead1he name of the documentation doc umentation file to generate- Documentation is included in specification files using the UDoc and U@,portedDoc directi'esd irecti'es- ?y default the file is not generated3e Support for C e,ceptions is enabled- 1his causes all calls to C code to be enclosed in tryNcatch bloc/s and C e,ceptions to be con'erted to Python e,ceptions- ?y default e,ception support is disabled3f He+ in 'ersion >-%4.arnings .a rnings are handled as if they +ere errors and the program terminates3g 1he Python =;B is released before ma/ing any calls to the CNC library being +rapped and reacquired after+ards- See 1he Python =lobal ;nterpreter Boc/ and the Release=;B and 8old=;B annotations3; D;R 1he directory is added to the list of directories searched +hen loo/ing for a specification file gi'en in an U;nclude or U;mport directi'e- Directory separators must al+ays be N- 1his option may be gi'en any number of times3j H0M?@R 1he generated code is split into the gi'en number of files- 1his ma/es it easier to use the parallel build facility of most modern implementations implementations of ma/e- ?y default % file is generated for each C structure or C class3/ He+ in 'ersion >-%#Deprecated since 'ersion >-%"5 0se the /ey+ordargumentsT\All\ UModule directi'e argument insteadAll functions and methods +ill$ by default$ support passing parameters using the Python /ey+ord argument synta,3o He+ in 'ersion >-%#Docstrings +ill be automatically generated that describe the signature of all functions$ methods and constructors3p MOD0B@ 1he name of the UConsolidatedModule +hich +ill contain the +rapper code for this component module3P
He+ in 'ersion >-%#?y default S;P generates code to pro'ide access to protected C functions from PythonOn some platforms )notably Binu,$ but not .indo+s* this code can be a'oided if the protected /ey+ord is redefined as public during compilation- 1his can result in a significant reduction in the si(e of a generated gen erated Python module- 1his option disables the generation of the e,tra code3r Debugging statements that trace the e,ecution of the bindings are automatically generated?y default the statements are not generated3s S022; 1he suffi, to use for generated C or C source files- ?y default -c is used for C and -cpp for C3t 1A= 1he S;P 'ersion tag )declared using a U1imeline directi'e* or the S;P platform tag )declared using the UPlatforms directi'e* to generate code for- 1his option may be gi'en any number of times so long as a s the tags do not conflict31 Deprecated since 'ersion >-%&-&5 1his option is no+ ignored and timestamps are al+ays disabled?y default the generated C and C source and header head er files include a timestamp specifying +hen they +ere generated- 1his option disables d isables the timestamp so that the contents of the generated files remain constant for a particular 'ersion of S;P3+ 1he display of +arning messages is enabled- ?y default +arning messages are disabled3, 2@A10R@ 1he feature )declared using the U2eature directi'e* is disabled3 ;D52;B@ He+ in 'ersion >-%"1he e,tract )defined +ith the U@,tract directi'e* +ith the iden tifier ;D is +ritten to the file 2;B@3y 2;B@ He+ in 'ersion >-%41he name of the Python Py thon type hints stub file to generate- 1his file contains a d escription of the module AP; that is compliant +ith P@P >4>- ?y default the file is not generated3( 2;B@ Deprecated since 'ersion >-%&-&5 0se the 2;B@ style instead1he name of a file containing more command line optionsCommand line options can also be placed in a file and passed on the command line using the prefi,-
(&1( SI/ Spec'*'cat'"$ F',es A S;P specification specification consists of some CNC type and function declarations and some directi'es1he declarations may contain annotations +hich pro'ide S;P +ith additional information that cannot be e,pressed in CNC- S;P does not include a full CNC parser-
;t is important to understand that a S;P specification describes the Python AP;$ i-e- the AP; a'ailable to the Python programmer +hen they import the generated module- ;t does not ha'e to accurately represent the underlying und erlying CNC library- 1here is nothing +rong +ith omitting functions that ma/e little sense in a Python conte,t$ or adding functions implemented +ith hand+ritten code that ha'e no CNC equi'alent- ;t is e'en possible )and sometimes necessary* to specify a different d ifferent super3class super3class hierarchy for a C class- All that matters is that the generated gen erated code compiles properly;n most cases the Python AP; matches the CNC AP;- ;n some cases hand+ritten code )see UMethodCode* is used to map from one to the other +ithout S;P ha'ing to /no+ the deta ils itself- 8o+e'er$ there are a fe+ cases +here S;P generates a thin +rapper around a C method or constructor )see =enerated Deri'ed Classes* and needs to /no+ the e,act C signature- 1o deal +ith these cases S;P allo+s t+o signatures to be specified- 2or e,ample5
class !lass W public5 NN 1he Python signature is a tuple$ but the underlying underlying C signature signature NN is a " element array!lass)S;PPG10PB@* Z)int X*[F UMethodCode int iarrZ"[F if )PyArgParse1uple)a#$ )PyArgParse1uple)a#$ \ii\$ ^iarrZ#[$ ^iarrZ%[** W NN Hote that +e use the S;P generated deri'ed class NN constructor Py?@=;HABBO.18R@ADS sipCpp T ne+ sip!lass)iarr*F Py@HDABBO.18R@ADS Y U@nd YF
(&10 Var'ab,e N#+bers "* Ar!#+e$ts S;P supports the use of --- as the last part of a function signature- Any remaining arguments are collected as a Python tuple-
(&1 A%%'t'"$a, A%%'t' "$a, SI/ T)pes T)pes S;P supports a number of additional data types that can b e used in Python signatures-
S;PAHGSBO1 Deprecated since 'ersion >-%41his is both a const char X and a PyObject X that is used as the type of the member instead of const char X in functions that implement the connection or disconnection of an e,plicitly generated signal to a slot- 8and+ritten code must be pro'ided to interpret the con'ersion correctlyS;PPG?022@R 1his is a PyObject X that implements the Python buffer protocolS;PPGCABBA?B@ 1his is a PyObject X that is a Python callable objectS;PPGD;C1 1his is a PyObject X that is a Python dictionary objectS;PPGB;S1 1his is a PyObject X that is a Python list objectS;PPGO?I@C1 1his is a PyObject X of any Python type- 1he type PyObject X can also be usedS;PPGSB;C@ 1his is a PyObject X that is a Python slice objectS;PPG10PB@ 1his is a PyObject X that is a Python tuple objectS;PPG1GP@ 1his is a PyObject X that is a Python type objectS;PQO?I@C1 Deprecated since 'ersion >-%41his is a QObject X that is a C instance of a class deri'ed from Qts QObject classS;PRO?ICOH Deprecated since 'ersion >-%41his is a QObject X that is a C instance of a class deri'ed from Qts QObject class- ;t is used as the type of the recei'er instead of const QObject X in functions that implement a connection to a slotS;PRO?ID;S Deprecated since 'ersion >-%41his is a QObject X that is a C instance of a class deri'ed from Qts QObject class- ;t is used as the type of the recei'er instead of const QObject X in functions that implement a disconnection from a slot-
S;PS;=HAB Deprecated since 'ersion >-%41his is a const char X that is used as the type of the signal instead of const char X in functions that implement the connection or disconnection of an e,plicitly generated signal to a slotS;PSBO1 Deprecated since 'ersion >-%41his is a const char X that tha t is used as the type of the member instead of const char X in functions that implement the connection or disconnection of an e,plicitly generated signal to a slotS;PSBO1COH Deprecated since 'ersion >-%41his is a const char X that tha t is used as the type of the member instead of const char X in functions that implement the connection of an internally generated signal to a slot- 1he type includes a comma separated list of types that is the C signature of of the signal1o ta/e ta/e an e,ample$ QAccel55connect;tem)* connects an internally generated signal to a slot- 1he signal is emitted +hen the /eyboard accelerator acc elerator is acti'ated and it has a single integer argument that is the ;D of the accelerator- 1he C signature is5 bool connect;tem)int connect;tem)int id$ const const QObject Xrecei'er$ Xrecei'er$ const char Xmember*F Xmember*F
1he corresponding S;P specification is5 bool connect;tem)int$ connect;tem)int$ S;PRO?ICOH S;PRO?ICOH$$ S;PSBO1COH)i S;PSBO1COH)int**F nt**F
S;PSBO1D;S Deprecated since 'ersion >-%41his is a const char X that tha t is used as the type of the member instead of const char X in functions that implement the disconnection of an internally generated signal to a slot- 1he type includes a comma separated list of types that is the C signature of of the signalS;PSS;K@1 1his is a Pyssi(et in Python '"-< and later and int in earlier 'ersions of Python-
(&14 /)th"$ A/I *"r App,'cat'"$s 1he main purpose of the sip module is to pro'ide functionality common to all S ;P generated bindings- ;t is loaded automatically and most of the time you +ill completely ignore ignore it- 8o+e'er$ it does e,pose some functionality that can be used by applicationsclass sip-array He+ in 'ersion >-%<1his is the type object for the type S;P uses to represent an array of a limited number of CNC types- 1ypically 1ypically the memory is not o+ned by Python so that it is not freed +hen +he n the object is garbage collected- A sip-array object can be created from a sip-'oidptr object by
calling sip-'oidptr-asarray)*sip-'oidptr-asarray)*- 1his allo+s the underlying unde rlying memory )interpreted as a sequence of unsigned bytes* to be processed much more quic/lysip-cast)obj sip-cast)obj$$ type* type* object 1his does the Python equi'alent of casting a C instance to one of its sub or super3class typesParameters5 "b> E the Python objectt)pe E the typeRetu Return rns5 s5 a ne+ ne+ Pyt Pytho hon n obje object ct is that that +rap +rapss the the same same C C ins insta tanc ncee as as obj$ obj$ but has the type typetypesip-delete)obj sip-delete)obj** 2or C instances this calls ca lls the C destructor- 2or C structures it returns the structures memory to the heapParameters5 "b> E the Python objectsip-dump)obj sip-dump)obj** 1his displays 'arious bits of useful information about the internal state of the Py thon object that +raps a C instance or C structureParameters5 "b> E the Python objectsip-enableautocon'ersion)type sip-enableautocon'ersion) type$$ enable* enable* bool He+ in 'ersion >-%>-;nstances of some classes may be automatically con'erted to other Python objects e'en though the class has been +rapped- 1his allo+s that beha'iour to be suppressed so that an instances of the +rapped class is returned insteadParameters5 t)pe E the Python type objecte$ab,e E is 1rue if auto3con'ersion should be enabled for the type- 1his is the default beha'iourRetu Return rns5 s5 1rue 1rue or 2al 2alse se dep depend endin ing g on +het +hether her or not not aut auto3 o3co con'e n'ers rsio ion n +as pre' pre'io ious usly ly enabled for the type- 1his allo+s the pre'ious state to be restored later onsip-getapi)name sip-getapi)name** 'ersion He+ in 'ersion >-91his returns the 'ersion number that has been set for an AP;- 1he 'ersion number is either set e,plicitly by a call to sip-setapi)* or implicitly by importing the module that defines itParameters5 $a+e E the name of the AP;Retu Return rns5 s5 1he 1he 'ers 'ersio ion n numb number er that that has has been been set set for for the the AP;P;- An An e,ce e,cept ptio ion n +ill +ill be raised if the AP; is un/no+nsip-isdeleted)obj sip-isdeleted)obj** bool 1his chec/s if the C instance or C structure has been deleted and returned to the heapParameters5 "b> E the Python objectRet Returns urns55 1rue if the the CNC CNC ins insttance ance has has bee been n del delet eted ed-sip-ispycreated)obj sip-ispycreated)obj** bool He+ in 'ersion >-%"-%1his chec/s if the C instance or C structure +as created by Python- ;f it +as then it is possible to call a C instances protected methodsParameters5 "b> E the Python objectRetu Return rns5 s5 1rue if the the CNC CNC inst instan ance ce +as +as cre creat ated ed by Pyth Python on-sip-ispyo+ned)obj sip-ispyo+ned)obj** bool 1his chec/s if the C instance or C structure is o+ned by PythonParameters5 "b> E the Python objectRet Returns urns55 1rue if the the CNC CNC ins insttance ance is o+ne o+ned d by by Pyt Pytho honn•
•
•
•
sip-setapi)name sip-setapi)name$$ version* version* He+ in 'ersion >-91his sets the 'ersion n umber of an AP;- An e,ception is raised if a different 'ersion number has already been set$ either e,plicitly by a pre'ious call$ or implicitly by importing the module that defines itParameters5 $a+e E the name of the AP;3ers'"$ E 1he 'ersion number to set for the AP;- er ersion sion numbers must be greater than or equal to %sip-setdeleted)obj sip-setdeleted)obj** 1his mar/s the C instance or C structure as ha 'ing been deleted and returned to the heap so that future references to it raise an e,cep tion rather than cause a program crash Hormally S;P handles such things automatically$ automatically$ but there may be circumstances +here this isnt possibleParameters5 "b> E the Python objectsip-setdestroyone,it)destroy sip-setdestroyone,it)destroy** •
•
He+ in 'ersion >-%>-".hen the Python interpreter e,its it garbage co llects those objects that it can- 1his means that any corresponding C instances and C structures o+ned by Python are destroyed0nfortunately this happens in an unpredictable order and so can cause memory faults +ithin the +rapped library- Calling this function +ith a 'alue of 2alse disables the automatic destruction of C instances and C structuresParameters5 %estr") E 1rue if all C instances and C structures o+ned by Python should be destroyed +hen the interpreter e,its- 1his is the defaultsip-settracemas/)mask sip-settracemas/)mask * ;f the bindings ha'e been created +ith S;Ps 3r command line option then the generated code +ill include debugging statements that trace the e,ecution of the code- );t is particularly useful +hen trying to understand the operation of a C librarys librarys 'irtual function calls-* Parameters5 +as E E the mas/ that determines +hich debugging statements are enabledDebugging statements are generated at the follo+ing points5 in a C 'irtual function )mask )mask is is #,###%* in a C constructor )mask )mask is is #,###"* in a C destructor )mask )mask is is #,###>* in a Python types init method )mask )mask is is #,###4* in a Python types del method )mask )mask is is #,##%#* in a Python types ordinary method )mask )mask is is #,##"#*?y default the trace mas/ is (ero and all debugging statements are disabledclass sip-simple+rapper 1his is an alternati'e type object than can be used as the base type of an instance +rapped by S;P- Objects using this are smaller than those that use the default sip-+rapper sip-+rapper type but do not support the concept of object o+nershipsip-S;P@RS;OH He+ in 'ersion >-"1his is a Python integer object that represents the S;P 'ersion number as a : part he,adecimal number )e-g- '>-#-# is represented as #,#>####*sip-S;P@RS;OHS1R •
• •
•
•
•
He+ in 'ersion >-:1his is a Python string object that defines the S;P 'ersion number as represented as a string- 2or de'elopment 'ersions it +ill contain either -de' or 3snapshot3sip-transferbac/)obj sip-transferbac/)obj** 1his function is a +rapper around sip1ransfer?ac/)*sip-transferto)obj sip-transferto)obj$$ owner * 1his function is a +rapper around sip1ransfer1o)*sip1ransfer1o)*sip-un+rapinstance)obj sip-un+rapinstance)obj** integer 1his returns the address$ as an integer$ of a +rapped CNC structure or class instanceParameters5 "b> E the Python objectRetu Return rns5 s5 an inte intege gerr tha thatt is is the the add addre ress ss of the the CNC CNC inst instan ance ce-class sip-'oidptr 1his is the type object for the type S;P uses to represent a CNC 'oid X- ;t may ha'e a si(e associated +ith the address in +hich case the Python buffer interface is supported- 1he type has the follo+ing methods init)address init)addressZ$ Z$ size=-1 size=-1Z$ Z$ writeable=True[[* writeable=True [[* Parameters5 a%%ress E the address$ either another sip-'oidptr$ Hone$ a Python Capsule$ a Python CObject$ an object that implements the buffer protocol or an integers'?e E the optional associated si(e of the bloc/ of memory and is negati'e if the si(e is not /no+n2r'teab,e E set if the memory is +riteable- ;f it is not specified$ and address is address is a sip-'oidptr instance then its 'alue +ill be used int)* integer 1his returns the address as an integerRetu Return rns5 s5 the the inte integer ger addr addres esss getitem)idx getitem)idx** item He+ in 'ersion >-%"1his returns the item at a gi'en inde,- An e,ception +ill be raised if the address does not ha'e an associated si(e- ;n this +ay it beha'es li/e a Python memory'ie+ objectParameters5 '%= E is the inde, +hich may either be an integer$ an object that implements inde,)* or a slice objectRetu Return rns5 s5 the the item item-- ;f the the inde inde, , is is an inte intege gerr then then the the ite item m +ill +ill be be a Pyt Pytho hon n '" string object or a Python ': bytes object containing the single byte at that inde,- ;f the inde, is a slice object then the item +ill be a ne+ 'oidptr object defining the subset of the memory corresponding to the slice he,)* string 1his returns the address as a he,adecimal stringReturn Returns5 s5 the he,adeci he,adecimal mal string string address address- len)* integer He+ in 'ersion >-%"1his returns the si(e associated +ith the addressReturns5 Returns5 the associat associated ed si(esi(e- An An e,ception e,ception +ill +ill be raised raised if if there is nonenone setitem)idx setitem)idx$$ item* item* He+ in 'ersion >-%"1his updates the memory at a gi'en g i'en inde,- An e,ception +ill be raised if the address does not ha'e an associated si(e or is not +ritable- ;n this +ay it beha'es li/e a •
•
•
Python memory'ie+ objectParameters5 '%= E is the inde, +hich may either be an integer$ an object that implements inde,)* or a slice object'te+ E is the data that +ill update the memory defined by the inde,- ;t must implement the buffer interface and be the same si(e as the data that is being updatedasarray)Z size=-1[* size=-1[* 5class5sip-array He+ in 'ersion >-%&-<1his returned the bloc/ of memory as a sip-array object- 1he memory is not copied copiedParameters5 s'?e E the si(e of the array- ;f it is negati'e then the si(e associated +ith the address is used- ;f there is no associated si(e then an e,ception is raisedReturns5 the sip-array objectascapsule)* capsule He+ in 'ersion >-%#1his returns the address as an unnamed Python Capsule- 1his requires Python ':-% or later or Python '"- or laterRetu Return rns5 s5 the the Caps Capsul uleeascobject)* cObject 1his returns the address as a Python CObject- 1his is deprecated +ith Python ':-% and is not supported +ith Python ':-" and laterRetu Return rns5 s5 the the CObj CObjec ecttasstring)Z size=-1[* size=-1[* stringNbytes 1his returns a copy of the bloc/ of memory as a Python '" string object or a Python ': bytes objectParameters5 s'?e E the number of bytes to copy- ;f it is negati'e then the si(e associated +ith the address is used- ;f there is no associated si(e then an e,ception is raisedReturns5 the string or byt bytes objectgetsi(e)* integer 1his returns the si(e associated +ith the addressReturns5 Returns5 the associat associated ed si(e si(e +hich +hich +ill be negati'e negati'e if there is nonenonesetsi(e) size* size* 1his sets the si(e associated +ith the addressParameters5 s'?e E the si(e to associate- ;f it is negati'e nega ti'e then no si(e is associatedget+riteable)* bool 1his returns the +riteable state of the memoryReturn Returns5 s5 1rue 1rue if if the the memor memory y is +rite +riteabl ableeset+riteable)writeable set+riteable)writeable** 1his sets the +riteable state of the memoryParameters5 2r'teab,e E the +riteable state to setsip-+rapinstance)addr sip-+rapinstance)addr $ type* type* object 1his +raps a C structure or C class instance in a Python object- ;f the instance has already been +rapped then a ne+ reference to the e,isting object is returnedParameters5 a%%r E the address of the instance as a numbert)pe E the Python type of the instanceRet Returns urns55 the Pyt Python hon obj objec ectt tha thatt +r +raps aps the the ins instanc tancee•
•
•
•
class sip-+rapper 1his is the type object of the default base type of all instances +rapped b y S;P1he Supertype class annotation can be used to specify a different base type for a classclass sip-+rappertype 1his is the type object of the metatype of the sip-+rapper type-
Chapter 0: p)Tesser 0&1 I$tr"%#ct'"$: Py1esser Py1esser is an Optical Character Recognition module for Python- ;t ta/es as input an image or image file and outputs a stringPy1esser Py1esser uses the 1esseract OCR engine )an Open Source project at =oogle*$ con'erting images to an accepted format and calling the 1esseract 1esseract e,ecutable as an e,ternal script- A .indo+s .indo+s e,ecutable is pro'ided along +ith the Python scripts- 1he scripts should +or/ in Binu, as +ellPy1esser5 Py1esser5 http5NNcode-google-comNpNpytesserN 1esseract5 http5NNcode-google-comNpNtesseract3ocrN
0&( Depe$%e$c'es: P;B is required to +or/ +ith images in memory- Py1esser Py1esser has been tested +ith Python "-> in .indo+s P-http5NN+++-python+are-comNproductsNpilN
0&0 I$sta,,at'"$: Py1esser Py1esser has no installation functionality in this release- @,tract pytesser-(ip into directory +ith other scripts- Hecessary files are listed in 2ile Dependencies belo+-
0& Usa!e: from pytesser import X im T ;mage-open)]phototest-tif]* te,t T imagetostring)im* print te,t 1his is a lot of %" point te,t to test the ocr code and see if it +or/s on all types of file format- 1he quic/ bro+n dog jumped o'er the la(y fo,- 1he quic/ bro+n dog dog jumped o'er the la(y fo,- 1he quic/ bro+n dog jumped o'er the la(y fo,- 1he quic/ bro+n dog jumped o'er the la(y fo,try5 --- te,t T imagefiletostring)]fnord-tif]$ gracefulerrorsT2alse* --e,cept errors-1esser=eneral@,ception$ errors-1esser=eneral@,ception$ 'alue5 --- print \fnord-tif is incompatible filetype- 1ry gracefulerrorsT1rue\ gracefulerrorsT1rue\ --- print 'alue --fnord-tif is incompatible filetype- 1ry gracefulerrorsT1rue 1esseract 1esseract Open Source OCR @ngine @ ngine readtifimage5@rror5;llegal image format5Compression 1essedit5@rror5Read 1essedit5@rror5Read of file failed5fnord-tif Signale,it :% A?OR1- BocCode5 : AbortCode5 : te,t T imagefiletostring)]fnord-ti imagefiletostring)]fnord-tif]$ f]$ gracefulerrorsT1rue* print \fnord-tif contents5\$ te,t fnord-tif contents5 fnord te,t T imagefiletostring)]fonts imagefiletostring)]fontstest-png]$ test-png]$ gracefulerrorsT1rue* print
te,t %" pt And Arna(+ng+ fe+ d+scotheques pro'+de ju/ebo,es 1ames Ama(mgly fe+ dnscotheques pm',de Iu/ebo,es "> pt5 Arial5 Ama(ingly fe+ discotheques pro'ide jul
0& F',e Depe$%e$c'es: pytesser-py pytesser-py Main module for importing util-py 0tility functions used used by pytesser-py errors-py errors-py ;nterprets e,ceptions thro+n by 1esseract tesseract-e,e @,ecutable called by pytesser-py pytesser-py tessdataN Resources used by tesseract-e,e
0&4 /)th"$ I+a!e L'brar) 0&4&1 I$tr"%#ct'"$ 1he Python ;maging Bibrary adds image processing capabilities to your Python interpreter1his library pro'ides e,tensi'e file format support$ an efficient internal representation$ and fairly po+erful image processing capabilities1he core image library is designed for fast access to data stored in a fe+ basic pi,el formats- ;t should pro'ide a solid foundation for a general image processing toolBets loo/ at a fe+ possible uses of this library5
0&4&( I+a!e Arch'3es Arch'3es 1he Python ;maging Bibrary is ideal for for image archi'al and batch processing applicationsGou Gou can use the library to create thumbnails$ con'ert bet+een file formats$ print images$ etc1he current 'ersion identifies and reads a large number of formats- .rite support is intentionally restricted to the most commonly used interchange and presentation formats-
0&4&0 I+a!e D'sp,a) 1he current release includes 1/ Photo;mage and ?itmap;mage in terfaces$ as +ell as a .indo+s D;? interface that can be used +ith Python.in P ython.in and other .indo+s3based .indo+s3based tool/its- Many other =0; tool/its come +ith some /ind of P;B support2or debugging$ theres also a sho+ method +hich sa'es an image to dis/$ and calls an e,ternal display utility-
0&4& I+a!e /r"cess'$! 1he library contains basic image processing functionality$ including point operations$ filtering +ith a set of built3in con'olution /ernels$ and colour space con'ersions1he library also supports image resi(ing$ rotation and arbitrary affine transforms-
1heres a histogram method allo+ing you to pull some statistics out of an image- 1his can be used for automatic contrast enhancement$ and for global statistical analysis-
0&4&4 Us'$! the I+a!e C,ass 1he most important class in the Python ;maging Bibrary is the ;mage class$ defined in the module +ith the same name- Gou Gou can create instances of this class in se'eral +aysF either by loading images from files$ processing other images$ or creating images from scratch1o load load an image from a file$ use the open function in the ;mage module-
import ;mage im T ;mage-open)\lena-ppm\* ;m age-open)\lena-ppm\* ;f successful$ this function returns an ;mage object- Go Gou can no+ use instance attributes to e,amine the file contents print im-format$ im-si(e$ im-mode PPM )<%"$ <%"* R=? 1he format attribute identifies the source of an image- ;f the image +as not read from a file$ it is set to Hone- 1he si(e attribute is a "3tuple containing +idth and height )in pi,els*- 1he mode attribute defines the number and names of the bands in the image$ and also the pi,el type and depth- Common modes are 6B7 )luminance* for greyscale images$ 6R=?7 for true colour images$ and 6CMG!7 for pre3press images;f the file cannot be opened$ an ;O@rror e,ception is raisedOnce you ha'e an instance of the ;mage class$ you can use the methods defined by this class to process and manipulate the image- 2or e,ample$ lets display the image +e just loaded5
im-sho+)* )1he standard 'ersion of sho+ is not 'ery efficient$ since it sa'es the image to a temporary file and calls the ,' utility to display the image- ;f you dont ha'e ,' installed$ it +ont e'en +or/.hen it does +or/ though$ it is 'ery handy for debugging and tests-* 1he follo+ing sections pro'ide an o'er'ie+ of the different functions pro'ided in this library-
Rea%'$! a$% 6r't'$! I+a!es 1he Python ;maging Bibrary supports a +ide ' ariety of image file formats- 1o read read files from dis/$ use the open function in the ;mage module- Gou Gou dont ha'e to /no+ the file format to open a file- 1he library automatically determines the format based on the contents of the file1o sa'e a file$ use the sa'e method of the ;mage class- .hen sa'ing files$ the name becomes
important- 0nless you specify the format$ the library uses the filename e,tension to disco'er +hich file storage format to use-
Con'ert files to IP@=
import os$ sys import ;mage for infile in sys-arg'Z%5[5 f$ e T os-path-splite,t)infile* os-path-splite,t)infile* outfile T f \-jpg\ if infile _T outfile5 try5 ;mage-open)infile*-sa'e)outfile* e,cept ;O@rror5 print \cannot con'ert\$ infile A second argument can be supplied to the sa'e method +hich e,plicitly specifies a file format- ;f you use a non3standard e,tension$ you yo u must al+ays specify the format this +ay5
Create IP@= 1humbnails import os$ sys import ;mage si(e T %"4$ %"4 for infile in sys-arg'Z%5[5 outfile T os-path-splite,t)infile*Z#[ \-thumbnail\ if infile _T outfile5 try5 im T ;mage-open)infile* im-thumbnail)si(e* im-sa'e)outfile$ \IP@=\* e,cept ;O@rror5 print \cannot create thumbnail for\$ infile ;t is important to note that the library doesnt decode or load the raster data unless it really has to.hen you open a file$ the file header is read to determine the file format and e,tract things li/e mode$ si(e$ and other properties required to decode the file$ but the rest of the file is not processed until later1his means that opening an image file is a fast operation$ +hich is independent of the file si(e
and compression type- 8eres a simple script to quic/ly identify a set of image files5
;dentify ;mage 2iles import sys import ;mage for infile in sys-arg'Z%5[5 try5 im T ;mage-open)infile* print infile$ im-format$ \Ud,Ud\ U im-si(e$ im-mode e,cept ;O@rror5 pass
Cutting$ Pasting and Merging ;mages 1he ;mage class contains methods allo+ing you to manipulate regions +ithin an image- 1o e,tract a sub3rectangle from an image$ use the crop methodCopying a subrectangle from an image
bo, T )%##$ %##$ >##$ >##$ >##* region T im-crop)bo,* 1he region is defined by a >3tuple$ +here coordinates are )left$ upper$ right$ lo+er*- 1he Python ;maging Bibrary uses a coordinate system +ith )#$ #* in the upper left corner- Also Also note that coordinates refer to positions bet+een the pi,els$ so the region in the abo'e e,ample is e,actly :##,:## pi,els1he region could no+ be processed in a certain manner and pasted bac/Processing a subrectangle$ and pasting it bac/
region T region-transpose);mage-RO1A1@%4#* im-paste)region$ bo,* .hen pasting regions bac/$ the si(e of the region must match the gi'en region e,actly- ;n addition$ the region cannot e,tend outside the image- 8o+e'er$ the modes of the original image and the region do not need to match- ;f they dont$ d ont$ the region is automatically con'erted before being pasted )see the section on Colour 1ransforms 1ransforms belo+ for details*-
8eres an additional e,ample5
R",,'$! a$ '+a!e
def roll)image$ delta*5 \Roll an image side+ays\ ,si(e$ ysi(e T image-si(e delta T delta U ,si(e if delta TT #5 return image part% T image-crop))#$ #$ delta$ ysi(e** part" T image-crop))delta$ #$ ,si(e$ ysi(e** image-paste)part"$ )#$ #$ ,si(e3delta$ ysi(e** image-paste)part%$ ),si(e3delta$ #$ ,si(e$ ysi(e** return image 2or more ad'anced tric/s$ the paste method can also ta/e a transparency mas/ as an optional argument- ;n this mas/$ the 'alue "<< indicates that the pasted image is opaque in that position )that is$ the pasted image should be used as is*- 1he 'alue # means that the pasted image is completely transparent- alues alues in3bet+een indicate different le'els of transparency1he Python ;maging Bibrary also allo+s you to +or/ +ith the indi'idual bands of an multi3band image$ such as an R=? image- 1he split method creates a set of ne+ images$ ea ch containing one band from the original multi3band image- 1he merge function function ta/es a mode and a tuple of images$ and combines them into a ne+ image- 1he follo+ing sample s+aps the three bands of an R=? image5 Splitting and merging bands
r$ g$ b T im-split)* im T ;mage-merge)\R=?\$ )b$ g$ r** Hote that for a single3band image$ split returns the image itselfitself- 1o 1o +or/ +ith indi'idual colour
bands$ you may +ant to con'ert the image to 6R=?7 first-
=eometrical 1ransforms 1he ;mage class contains methods to resi(e and rotate an image- 1he former ta/es a tuple gi'ing the ne+ si(e$ the latter the angle in degrees counter3cloc/+iseSimple geometry transforms
out T im-resi(e))%"4$ %"4** out T im-rotate)><* V degrees counter3cloc/+ise 1o rotate rotate the image in 9# degree steps$ you can either use the rotate method or the transpose method- 1he latter can also be used to flip an image around its hori(ontal or 'ertical ' ertical a,isTra$sp"s'$! Tr a$sp"s'$! a$ '+a!e
out T im-transpose);mage-2B;PB@21R;=81* out T im-transpose);mage-2B;P1OP?O im-transpose);mage-2B;P1OP?O11OM* 11OM* out T im-transpose);mage-RO1A1@9#* out T im-transpose);mage-RO1A1@%4#* out T im-transpose);mage-RO1A1@"#* 1heres no difference in performance or result bet+een transpose)RO1A transpose)RO1A1@* and corresponding rotate operationsA more general form of image transformations can be carried out 'ia the transform method- See the reference section for details-
C","#r Tra$s*"r+s Tra$s*"r+s 1he Python ;maging Bibrary allo+s you to con'ert images bet+een different pi,el representations using the con'ert functionCon'erting bet+een modes
im T ;mage-open)\lena-ppm\*-con' ;mage-open)\lena-ppm\*-con'ert)\B\* ert)\B\* 1he library supports transformations bet+een each supported mode and the 6B7 and 6R=?7 modes- 1o con'ert bet+een other modes$ you may ha'e to use a n intermediate image )typically an 6R=?7 image*-
I+a!e E$ha$ce+e$t 1he Python ;maging Bibrary pro'ides a number of methods and modules that can be used to enhance images-
F',ters 1he ;mage2ilter module contains a number of pre3defined enhancement filters that can be used +ith the filter methodApplying filters
import ;mage2ilter out T im-filter);mage2ilter-D@1A;B*
/"'$t Operat'"$s 1he point method can be used to translate the pi,el 'alues of an image )e-g- image contrast manipulation*- ;n most cases$ a function object e,pecting one argument can be passed to the this method- @ach pi,el is processed according to that function5 Applying point transforms
V multiply each pi,el by %-" out T im-point)lambda i5 i X %-"* 0sing the abo'e technique$ you yo u can quic/ly apply any a ny simple e,pression to an image- Gou Gou can also combine the point and paste methods to selecti'ely modify an image5 Processing indi'idual bands
V split the image into indi'idual bands source T im-split)* R$ =$ ? T #$ %$ " V select regions +here red is less than %## mas/ T sourceZR[-point)lambda i5 i %## and "<<* V process the green band out T sourceZ=[-point)lambda i5 i X #-* V paste the processed band bac/$ but only +here red r ed +as %## sourceZ=[-paste)out$ Hone$ mas/* V build a ne+ multiband image im T ;mage-merge)im-mode$ source*
Hote the synta, used to create the mas/5
imout T im-point)lambda i5 e,pression and "<<* Python only e'aluates the portion of a logical e,pression as is necessary to determine the outcome$ and returns the last 'alue e,amined e,a mined as the result of the e,pression- So if the e,pression abo'e is false )#*$ Python does not no t loo/ at the second operand$ ope rand$ and thus returns #- Other+ise$ it returns "<<-
E$ha$ce+e$t 2or more ad'anced image enhancement$ you can use the classes in the ;mage@nhance moduleOnce created from an image$ an enhancement object can be used to quic/ly try out different settingsGou Gou can adjust contrast$ brightness$ colour balance and sharpness in this +ay@nhancing images
import ;mage@nhance enh T ;mage@nhance-Contrast)im* enh-enhance)%-:*-sho+)\:#U enh-enhance)%-:*-sho+ )\:#U more contrast\*
I+a!e Se5#e$ces 1he Python ;maging Bibrary contains some basic support for image sequences )also called animation formats*- Supported sequence formats include 2B;N2BC$ =;2$ and a fe+ e,perimental formats- 1;22 files can also contain more than one frame.hen you open a sequence file$ P;B automatically loads the first frame in the sequence- Gou Gou can use the see/ and tell methods to mo'e bet+een different frames5 Reading sequences
import ;mage im T ;mage-open)\animation-g ;m age-open)\animation-gif\* if\* im-see/)%* V s/ip to the second frame try5 +hile %5 im-see/)im-tell)*%* V do something to im e,cept @O2@rror5
pass V end of sequence As seen in this e,ample$ youll get an @O2@rror e,ception +hen the sequence ends Hote that most dri'ers in the current 'ersion of the library library only allo+ you to see/ to the ne,t frame )as in the abo'e e,ample*- 1o re+ind the file$ you may ha'e to reopen it1he follo+ing iterator class lets you to use the for3statement to loop o'er the sequence5 A sequence iterator class
class ;mageSequence5 def init)self$ im*5 self-im T im def getitem)self$ i,*5 try5 if i,5 self-im-see/)i,* return self-im e,cept @O2@rror5 raise ;nde,@rror V end of sequence for frame in ;mageSequence)im*5 V ---do something to frame---
Chapter : C"re /r"!ra+ S"#rce C"%e I+p"rt'$! e) +"%#,es import sys from PyQt>-QtCore import X from PyQt>-Qt=ui import X from pytesser import X from P;B import ;mage
UI I+p,e+e$tat'"$ class filedialogdemo)Q.idget*5 filedialogdemo)Q.idget*5 def init)self$ parent T Hone*5 super)filedialogdemo$ self*-init)parent*
layout T Q?o,Bayout)*
self-le T QBabel)\Optical Character Recognition 3 =HD;1 Project\* self-le-setAlignment)Qt-AlignCenter*
V si(e of +indo+ self-resi(e)"<#$ :##* layout-addStretch)%* self-btn T QPush?utton)\Clic/ here to select Picture\* self-btn-resi(e)self-btn-si(e8int)** Vself-btn-mo'e)<<#$ %##* self-btn-set2i,ed.idth):##* self-btn-clic/ed-connect)self-getfile* V center aligns the +indo+ self-mo'e)QApplication-des/top)*-screen)*-rect)*-center)* self-mo'e)QApplication-des/top)*-screen)*-r ect)*-center)* 3 self-rect)*-center)**
self-path T ]]
self-btn% T QPush?utton)\Sho+ data\* self-btn%-set2i,ed.idth):##* self-btn%-clic/ed-connect)lambda 5self-ocr)self-path**
layout-add.idget)self-le*
layout-add.idget)self-btn* layout-add.idget)self-btn%*
self-contents T Q1e,t@dit)* Q1e,t@dit)* self-contents-setMa,imum8eight)#* layout-add.idget)self-contents* self-setBayout)layout* self-set.indo+1itle)\Opti self-set.indo+1itle)\Optical cal Character Recognition\*
F',e /'cer I+p,e+e$tat'"$ def getfile)self*5 fname T Q2ileDialog-getOpen2ileHame)self$ ]Open file]$ ]NhomeN/umaramit%99&NPicturesN]$\;mages )X-png X-jpg X-gif*\* pi,map T QPi,map)fname* pi,map T pi,map-scaled1o.idt pi,map-scaled1o.idth):##* h):##* self-le-setPi,map)pi,map* self-mo'e)QApplication-des/top)*-screen)*-rect)*-center)* self-mo'e)QApplication-des/top)*-screen)*-r ect)*-center)* 3 self-rect)*-center)** self-path T str)fname* NN2ile Pic/er ;mpplementati def getfiles)self*5 dlg T Q2ileDialog)* dlg-set2ileMode)Q2ileDialog-Any2ile* dlg-set2ilter)]1e,t dlg-set2ilter)]1e,t files )X-t,t*]* filenames T QStringBist)*
if dlg-e,ec)*5 filenames T dlg-selected2iles)* f T open)filenamesZ#[$ ]r]*
+ith f5 data T f-read)* self-contents-set1e,t)data*
OCR C"$3ers'"$s def ocr)self$ path*5 if path is not ]]5 im T ;mage-open)path* te,t T imagetostring)im* te,t T imagefiletostring)path* te,t T imagefiletostring)path$ gracefulerrorsT1rue* gracefulerrorsT1rue* self-contents-set1e,t)te,t*
.a'$ '+p,e+e$tat'"$ def main)*5 app T QApplication)sys-arg'* e, T filedialogdemo)* e,-sho+)* sys-e,it)app-e,ec)**
Ca,,'$! .a'$ if name TT ]main]5 main)*
Chapter 4: L'3e E=a+p,e
2ig3Source ;mage);mage +ho]s te,t is to be fetched*
2ig3?eta ;nterface of pyOCR
+ig!$icking 'ource +ie
+ig!Beta Interfaceefore Con/ersionsD
+ig!Beta Interfacecon/erted te=t is seectedD
C"$c,#s'"$s 6.1 Results After successfully creating our @nglish character and language models$ +e assessed the accuracy of the pyOCR soft+are- .e +ere able to successfully recogni(e .nglish special characters and increase the o'erall accuracy- .e used a character based approach to assess the accuracy and increase the rate of correct recognition by 4U- 1he original accuracy +ith the @nglish character model +as &&U on a sample of %## characters and +e increased this to >-
6.2 Conclusions on pyOCR 1he goal of pyOCR is to pro'ide an accessible$ fle,ible$ and simple tool to preform optical character recognition- ;n its current state$ it is not the most user friendly utility and still has many /in/s to +or/ out- 1his is all understandable because it is in an alpha stage of de'elopment$ and +ill require some more attention before an official release- 1he actual theory behind character recog recogni niti tion on is in place place in the the soft soft+a +are re-- pyOC pyOCR R does does an ama(i ama(ing ng job job prepr preproce ocess ssin ing g and segmenting images and allo+s for many fine adjustments to fulfill a 'ariety of user needs- ;t is no+ just a matter of reorgani(ing and optimi(ing the code to create a user friendly e,perience.ith time$ +e belie'e pyOCR +ill be one of leading names in optical character recognition
soft+are-
6.3 Future Work As +e e,pect to e,tend the current current 'ersion of pyOCR for the De'anagiri De'anagiri Script Script +e are getting familiar +ith the types of challenges presented by accented characters and are trying to deal +ith them successfully- .e thus anticipate a future e,tension of pyOCR to most languages based on the Asian to be simple based off the current 'ersion2or languages +ith different alphabets li/e Chinese and Arabic +e thin/ it possible for a future +or/ project to adapt pyOCR to 'ertical and right to left character recognition since at the language model le'el$ +e defined 0nicode to be the standard used encoding- 1his is consistent +ith the need to represent most +ritten languages in unique encoding for further e,tensions to other languages- 1he training portion +ill then be the /ey for both the correcting representation and clustering of any ne+ set of characters or alphabetsAs mentioned in section "-:-"$ the pyOCR soft+are is run through multiple commands that represent each step of the recognition process starting from the preprocessing and segmentation and ending +ith the use of character and language models- .e belie'e it +ill be 'ery handy and useful to streamline these commands under a single command- 1his can sa'e a lot of time during future re'isions of the soft+are as it is necessary for e,tensi'e testing to run it multiple timesSuch a command can ta/e in flags for the different operations +ithin the digiti(ation pipeline$ and +hen omitted they +ill ha'e default 'alues for ease of use-
References Applications-\ Independent Component F1G 8y'Jrinen$ Aapo$ and @r//i Oja- \Algorithms and Applications-\ Independent Analysis )"###*5 %3:%- .e .ebb- Ian-3Apr- "#%"Gamamoto- istorical !eview o" #C! !esearc$ F2G Mori$ Shunji$ Ching G- Suen$ and !a(uhi/o Gamamoto- istorical and %evelopment - 1ech- no- ##%439"%9- olol- 4#- ;@@@$ %99"- Print- Proceedings of the ;@@@Z:[8olley$ RoseRose- \8o+ =ood Can ;t =et` Analysing and ;mpro'ing OCR Accuracy in Barge Scale 8istoric He+spaper Digitisation Programs-\ %-&ib Programs-\ %-&ib 'agazine'agazine- .eb- "4 Mar- "#%"http5NN+++-dlib-orgNdl http5NN+++-dlib-orgNdlibNmarch#9NholleyN#:holleyibNmarch#9NholleyN#:holley-htmlhtml(ystem- 1ech- D2!; and 0F*G ?reuel$ 1homas M- T$e py#C! #pen (ource #C! (ystem!aiserslautern$ Oct- "##"# #- .eb- < Apr- "#%"Z<[ 8andel$ Paul ..- Statistical Statistical Machine- =eneral @lectric Company$ assignee- Patent %9%<99:" Iune %9::- Print\1esseract OCR @ngine-\ Becture- )oogle CodeCode- =oogle ;nc$ "##- .e .ebb- Mar-3 F#G Smith$ Ray- \1esseract Apr- "#%"- http5NNtesseract3ocr-googlecode-comNfilesN1 http5NNtesseract3ocr-googlecode-comNfilesN1esseractOSCOH-pdfesseractOSCOH-pdfZ[ 1eh$ 1eh$ Gee Gee .hye$ .hye$ Simon Simon Osindero Osindero$$ and =eoffrey =eoffrey @- 8inton8inton- \@nergy3?ase \@nergy3?ased d Models Models for Sparse O'ercomplete Representations-\ Iournal of Machine Bearning Research >$ #: Dec- "##:- .eb-
\.eighted 2inite3State 1ransducers in F4G Mohri$ Mehryar$ 2ernando Pereira$ and Michael Riley- \.eighted Speech Recognition-\ *ublications Recognition-\ *ublications o" 'e$ryar 'o$ri'o$ri- "###- .eb- %# Apr- "#%"http5NN+++-cs-nyu-eduNmohriNpubNasr"###-ps-
F%G \2inite State Automata-\ (trona )+,wna)+,wna- .eb- %# Apr- "#%"http5NN+++-eti-pg-gda-plN/atedryN/i+Npraco+nicyNIan-Daciu/NpersonalNthesisNnode%"-html-
=reenfield$ !ara and Sarah Iudd- 6Open Source Hatural Banguage Processing-7 .orcester
Polytechnic ;nstitute- .eb- "4 Apr- "#%#- http5NN+++-+pi-eduNPubsN@3projectNA http5NN+++-+pi-eduNPubsN@3projectNA'ailableN@3 'ailableN@3 project3#>"4%#3#<<"<N-