# Programming Languages

**The number and kinds of programming languages provide insight into the skills required of code contributors and the nature of the projects themselves.** This metric can help newcomers navigate open source projects, as well as enable project and product managers to gain insight into the projectâ€™s profile within the context of their own experience and organisations. It can also help inform students on which programming languages they might focus their efforts on learning, depending on their topic of interest.

In [5]:
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
import plotly.express as px
from opensustainTemplate import *

In [6]:
df_active = pd.read_csv("../csv/project_analysis.csv")

In [7]:
license_dominating_language = (
    df_active["dominating_language"]
    .value_counts()
    .to_frame()
    .rename_axis("dominating_language_names")
    .reset_index()
)
license_dominating_language
license_dominating_language = license_dominating_language[
    (license_dominating_language["dominating_language"] > 4)
]
fig = px.pie(
    license_dominating_language,
    values="dominating_language",
    names="dominating_language_names",
    color_discrete_sequence=color_discrete_sequence,
    hole=0.2,
)


fig.update_layout(showlegend=False, font_size=16, dragmode=False)
fig.update_traces(
    textposition="inside",
    textinfo="percent+label",
    marker=dict(line=dict(color=boarder_color, width=1)),
)
fig["layout"].update(margin=dict(l=0, r=0, b=0, t=0))
config = {
  'toImageButtonOptions': {
    'format': 'svg', # one of png, svg, jpeg, webp
  },
  'responsive':'true'
}
fig.show(config=config)

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: languages-distribution

\- Distribution of programming languages
```

**Python dominates the OSS movement for sustainability and is used in 39.8% of all projects, followed by R (16.7%), Jupyter notebooks (9.34%) and other languages like Fortran, C++ and Java.** Statistics from [GitHut 2.0](https://madnight.github.io/githut/#/pull_requests/2022/1) or the [official numbers](https://octoverse.github.com/#geographical-distribution-of-active-users) of GitHub provide insights into the programming language usage of Open Source projects. Compared to the broader open source ecosystem, it is clear that Python has a significantly higher use within the repositories analysed, compared to widely popular languages, such as JavaScript. This indicates a strong focus on analysing large datasets, where Python and Jupyter Notebooks are increasingly dominant, with less focus on the web application side. Python, in particular, is the language of choice in projects in energy modelling, biosphere, hydrosphere, wind energy, buildings and heating. Python is considered to be an energy-inefficient programming language. However, in practice, computationally intensive operations are typically offloaded to energy-efficient processes (e.g., via a C-API) using libraries such as NumPy.

**The use of R deviates significantly from other statistics and has a high prevalence within the software world.** A concentration of R developments can be found, particularly within the topics of biosphere, hydrosphere, water supply, soil and land use, climate, and food and agriculture. This can be attributed to the high number of data statistical-related projects within these topics, and the low number of general web development projects within the field of Sustainable Development. Despite its advanced age of over 65 years, Fortran is still widely used in the Earth system models applied across hydrosphere, climate and atmosphere fields. This can be explained by the long development time of these projects, and the necessary numerical efficiency of such models for high-performance computing.

**Julia, a relatively new language, also has a wide range of applications.**  For some special use cases, such as building simulation, programming languages like Modelica are frequently used. 

In [8]:
df_language_distribution = (
    df_active.value_counts(["topic", "dominating_language"]).to_frame().reset_index()
)

df_language_distribution.rename(columns={0: "counts"}, inplace=True)
fig = px.scatter(
    df_language_distribution,
    x="dominating_language",
    y="topic",
    size="counts",
)


fig.update_layout(
    height=1000,  # Added parameter
    width=1200,
    xaxis_title=None,
    yaxis_title=None,
    title="Distribution of programming languages within topics",
    dragmode=False,
)
fig.update_traces(marker_color=marker_color)

fig.add_layout_image(
    dict(
        source=logo_img,
        xref="paper",
        yref="paper",
        x=1,
        y=1,
        sizex=0.05,
        sizey=0.05,
        xanchor="right",
        yanchor="top",
    )
)

config = {
  'toImageButtonOptions': {
    'format': 'svg', # one of png, svg, jpeg, webp
  },
  'responsive':'true'
}
fig.show(config=config)

```{figure} data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
:figclass: caption-hack
:name: languages-within-topics

\- Distribution of programming languages within topics
```