— gsoc — 6 min read
Hiya Everyone!!
This post will explain in brief about the work I had done during the Google Summer of Code 2019 with coala. For the unitiated, my project was to give coala the capability to handle nested programming language in a single file.
I must say, there is a huge contrast between the project proposal I submitted and the actual implementation. Though the core concept remains the same but the implemenation of the various parts are a little different.
Please do note that this project currently only support the combination of files having Python and Jinja2. Work is in progress to support other languages.
So let's dive in to the project and see what all has been done :D
This is for the people who are really interested and want to try this feature( I know which most of you aren't xD). Follow the below steps:
NOTE: Switch on your virtual env, if you don't want your working coala be disturpted by this feature from hell xD(reviews are left to be done on this feature)
coala
pip install -r requirement.txt
pip3 install -e .
Once you have followed the above steps, you now have coala with the feature of handling nested languages. If you aren't bored already go on to the next section to find out how you can use this awesome featureee(Shameless praising of oneself xD)
Use the following file to test this mode:
1def hello():2 {% raw %}{%set x='thanos'%} {{var}} print( "yo!!") {%set y='he rocks'%}3 {% endraw %}4
5print("Holaaa Amigos" )
Save the above snippet in a file test.py
. We'll now go ahead and run
PEP8Bear
, Jinja2Bear
and SpaceConsistencyBear
upon the file.
You can do that by using:
1coala --handle-nested --languages=python,jinja2 --files=test.py.jj22--bears=PEP8Bear,Jinja2Bear,SpaceConsistencyBear
And tada!!! You'll be prompted by coala to fix the patches as it usually does in normal execution.
The argument --handle-nested
is used to specify that the files has a combination
of nested languages. The argument --languages
is to mention the languages that
exists in the files. Note that, it is important to mention the --languages
argument since coala cannot auto detect the languages present in the nested
language file.
Also special care has been taken to keep the signicficant part of the core coala untounched by my implementation. No major changes have been done to the way coala works. This would help as this is still an experimental features and many changes are yet to come.
NOTE: An asciinema of the working is here
Whoo! So you do wanna jump along with Alice down the rabbit hole!! Noiceee...
Let's go on in then.
Before we go in lemme give you a very high level view of what happens behind the scenes.
A source file containing multiple programming languages is loaded up by coala. The original nested file is broken/split into n temporary files, where n is the number of languages present in the original file. Each temporary file only contains the snippets belonging to one language.
These temporary files would then be passed to their respective language bears ( chosen by the user) where the static analysis routines would run. The temporary files after being linted would then be assembled.
The above process creates the illusion that the original file is being linted, but in reality we divide the files into different parts and lint them one by one.
The figure below, would help give a clear picture
Now's lets have a look at the different parts of the architecture shall we??
NlInfoExtractor is responsible to parse in the cli arguments passed into the coala during the Nested Language Mode. It extracts all the information from the arguments that would help coala in future to properly lint the file.
For eg: If the user passes the following arguments to coala
1coala --handle-nested --languages=python,jinja2 --files=test.py,test2.py2--bears=PEP8Bear,SpaceConsistencyBear,Jinja2Bear 3--settings use_space=True
We can get the following information from the above arguments.
1{2 'bears': ['PEP8Bear', 'SpaceConsistencyBear', 'Jinja2Bear'],3 'files': ['test.py', 'test2.py'],4 'lang_bear_dict': {5 'jinja2': ['Jinja2Bear'],6 'python': ['PEP8Bear', 'SpaceConsistencyBear']7 },8 'languages': ['python', 'jinja2'],9 'nl_file_info': { 'test.py' : {10 'python' : 'test.py_nl_python',11 'jinja2' : 'test.py_nl_jinja2'12 },13 'test2.py': {14 'python' : 'test2.py_nl_python',15 'jinja2' : 'test2.py_nl_jinja2'16 }17 }18}
This information is then used to make the coala-sections. In more layman terms, coala-sections are basically the information passed to coala to inform about the type of files and the linters it needs to initialize for the linting of the file to take place.
The implemenation for this can be found here
NlCliParsing is responsible to create coala-sections from the information passed from the NlInfoExtractor.
For eg: for the above argument list, it creates the following sections:
1files = test.py_nl_python2bears = PEP8Bear,SpaceConsistencyBear3handle_nested = True4languages = python,jinja25file_lang = python6orig_file_name = test.py
Also it is important to note that we create one section for each temporary file. That means for the above arguments we will have 4 coala-sections one for each temporary file.
The implemenation for this can be found here
Parser is used to segragate the original file into the temporary files.
Let's understand what this means:
Suppose we have the following nested file which has both Jinja and Python.
1import sys2import thanos3
4x = "hulk"5
6{% raw %}{% if x == "hulk" %}7 print("Thanos is meat") 8{% else %}9 print("Bhubyee! Earth")10{% endif %}{% endraw %}11
12if(y is not x):13 "Some gibberish"
As you see in the above snippet - We have a mixture of Python
and Jinja
. And
if we have to be able to lint these files seprately i.e if we have to convert
the above file into pure jinja
and pure python
file, we would need to
maintain the information about what lines belong to which language.
We do this, by divinding the files into different section. I call these section
NlSection
which stand for Nested language Section
.
The above snippet can be divided into three sections:
1import sys2import thanos3
4x = hulk
1{% if x == "hulk" %}2 print("Thanos is meat") 3{% else %}4 print("Bhubyee! Earth")5{% endif %}
{% endraw %}
1if(y is not x):2 print("Some gibberish")
Divinding the files into various sections, makes the work of linting the file
easier. What we do after creating sections is that we group the sections on the
basis of their language and then make pure python
and a pure jinja
file.
These pure files would then be passed upon to the respective language bears.
NOTE: Creating sections does mean that we create new sections rather I mean that
we maintain the infromation such as the line_start
and line_end
in the
original file.
Each of the section we made in the nested language file is termed as NlSection
.
A NlSection has the following information:
The start and end information of the sections are inturn a NlSectionPosition
object.
That means NlSection comprises of two object start
and end
, and these objects
are of the type NlSectionPosition
.
NlSectionPosition has better functionality to store line numbers and column numbers along with proper testing.
Implementation for NlSection is here
Implementation for NlSectionPosition is here
NlCore is the interface which connect the normal coala with this nested language mode. This helps us to keep both of them seperate. All the calls to the different parts of Nested language arhcitecture are made via the API's provided by this module.
The implementation for this can be found here
Anddd!! That's it folks. That covers up all the important parts that were used to make this possible. It's really a very high level view of what's actually been done. I didn't want to go into the details of each and every module because doing so would have gotten you really confused :P, But if you are interested, in knowing more about everything -- Feel free to contact me :)
The most challenging part of the project was to come up with a proper architecuture that would work. I must confess that I put in more than 2 month of time before GSoC to try out various possibilites that might make this project a reality. I am grateful to all the mentors of coala who kept reviewing my various approaches even with their busy schedule, without their insights I would never have been able to finalise on my proposal.
Since this project deals with the core of the coala, I was pretty scared to get into the waters of the huge codebase. But with the constant support of mentors and as the time went on, I was about to understand how the different components of coala actually pieced together.
Another difficult situation that I faced was during my Third Phase of GSoC. The approach I planned out for the assembling of the files did not seem to work. So I was supposed to come with a new approach in a span of 11 days. I was pretty pissed and scared ,but apparently it took lesser time than i expected, this taught me not overestimate the time and take things one at a time.
And last not the least was about maintaining the coverage while implementing the details. It took me time to soak in all the coverage, but it all turned out to be good.
The entire experience was really fun, though at times there were few hair plucking moment but this taught me a lot about how the architectures need to be made and how different components needs to be designed to keep track of scalability.
I'm also write in another post sharing about my experience in another post. Until then!! Naveen is signing offf. Bhubyeee